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ABSTRACT 


THE HILBERT SPACE OE PROBABILITY MASS EUNCTIONS 
AND APPLICATIONS ON PROBABILISTIC INEERENCE 


Bayramoglu, Muhammet Eatih 
Ph.D., Department of Electrical and Electronics Engineering 
Supervisor : Assoc. Prof. Dr. Ali Ozgiir Yilmaz 

September 2011, 11231 pages 


The Hilbert space of probability mass functions (pmf) is introduced in this thesis. A factor¬ 
ization method for multivariate pmfs is proposed by using the tools provided by the Hilbert 
space of pmfs. The resulting factorization is special for two reasons. Eirst, it reveals the 
algebraic relations between the involved random valuables. Second, it determines the condi¬ 
tional independence relations between the random variables. Due to the first property of the 
resulting factorization, it can be shown that channel decoders can be employed in the solution 
of probabilistic inference problems other than decoding. This approach might lead to new 
probabilistic inference algorithms and new hardware options for the implementation of these 
algorithms. An example of new inference algorithms inspired by the idea of using channel 
decoder for other inference tasks is a multiple-input multiple-output (MIMO) detection algo¬ 
rithm which has a complexity of the square-root of the optimum MIMO detection algorithm. 


Keywords: The Hilbert space of pmfs, factorization of pmfs, probabilistic inference, MIMO 
detection, Markov random fields 
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OLASILIK KUTLESi FONKSiYONL ARININ HiLBERT UZAYI 
VE OEASIEISIKSAE BIEGi giKARIMI UZERINE UYGUEAMAEARI 


Bayramoglu, Muhammet Fatih 
Doktora, Elektrik Elektronik Miihendisligi Boliimii 
Tez Yoneticisi : D09. Dr. Ali Ozgiir Yilmaz 

Eyliil 20 II. II 23 ] savfa 


Bu tezde olasdik kiitlesi fonksiyonlarmm Hilbert uzayi sunulmaktadir. Bu Hilbert uzayinm 
sagladigi olanaklar kullandarak 90k degi§kenli olasdik kiitlesi fonksiyonlarim 9arpanlarina 
ayirmak i9in bir yontem onerilmi§tir. Bu ydntemden elde edilen 9arpanlara ayirma iki nedenle 
dzeldir. ilk olarak, bu 9arpanlara ayirma rastgele degi§kenler arasmdaki cebirsel bagmtilari 
ortaya koyar. ikinci olarak, rastgele degi§kenler arasmdaki ko§ullu bagimsizbk ili§kilerini 
belirler. Birinci dzellik sayesinde kanal kod 90zuculerinin, kod 90zmekten ba§ka olasibksal 
bilgi 9ikarimi problemlerinin 9dziimiinde de kullamlabilecegi gdsterilebilir. Bu yakla§im yeni 
olasibksal bilgi 9ikarimi algoritmalarma ve bu algoritmalari ger9eklemek i9in yeni donamm 
olanaklanna yol a9abilir. Kod 9dzuculerin kod 9dzmekten ba§ka bilgi 9ikarimi gorevlerinde 
kullamlmasi fikrinden esinlenen algoritmalarm bir drnegi, karma§ikbgi en iyi algoritmamn 
karekdku olan bir 9ok-girdili 9ok-9iktib sezim algoritmasidir. 


Anahtar Kelimeler: Olasdik kiitlesi fonksiyonlarmm Hilbert uzayi, Olasdik kiitlesi fonksiyon¬ 
larmm 9arpanlara ayrdmasi, olasibsiksal bilgi 9ikarimi, 9ok-girdili 9ok-9iktdi sezim, Markov 
rastgele alanlar 
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PREFACE 


This thesis summarizes the research work carried out in six years starting from September 
2005 . The research topic arose while I was trying to develop an analysis method for the con¬ 
vergence rate of the iterative sum-product algorithm. Since the messages (beliefs) passed be¬ 
tween the nodes in the iterative sum-product algorithm are probability mass functions (pmf), I 
thought that representing the pmfs in a Hilbert space structure would prove useful in the anal¬ 
ysis of the sum-product algorithm. Analyzing the convergence of the sum-product algorithm 
would be an application of the norm in the Hilbert space of pmfs. However, later I noticed 
that the inner product has much more interesting applications and preferred focusing on the 
applications of the inner product to dealing with the convergence which led to this thesis. 

In order to read the thesis a basic understanding of inner product spaces and finite fields is 
necessary. Anybody wifh Ibis background can follow fhe chapfers from fhe second fo fhe fiffh. 
I believe fhaf fhese chapters are fhe core of fhe fhesis. Chapfer 6 confains some applications 
from communication fheory and mighf require a communicafion fheory background. 
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This fhesis is relafed probability fheory. Probability fheory is an area which is close fo fhe 
border befween science and belief. Alfhough Laplace’s book on celestial mechanics misses fo 
mention God, my explanafion on fhe relafion befween probabilify and willpower makes me fo 
believe in God and I would like fo sfarf fo fhe resf of fhe fhesis by a quote from fhe franslafion 
of Qur’an which explains whaf is science fo me: “Glory be fo You, we have no knowledge 
excepf whaf you have laugh! us. Verity, if is You (Allah), fhe All-Knower, fhe All-Wise”. 
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CHAPTER 1 


INTRODUCTION 


1.1 Motivation 

A linear vector space structure over a set provides algebraic tools such as addition and scaling 
to carry out on the elements of the set. If a vector space can be endowed with an inner product 
then it becomes an inner product space. An inner product provides geometric concepts such 
as norm, distance, angle, and projections. If every Cauchy sequence in an inner product 
space converges with respect to the inner product induced norm then the inner product space 
becomes a Hilbert space. Needless to say a Hilbert space structure is very useful and find 
application areas in diverse fields of science. Communication theory is not an exception. For 
instance, the signal space representation in communication theory relies on the Hilbert space 
structure constructed over the set of square integrable functions. 

One of the mathematical objects that is too frequently used in communication and information 
theories is the probability mass functions (pmf) which are discrete equivalents of probability 
density functions. Although, pmfs are so frequently used in communication and information 
theories a Hilbert space structure for them was missing. A Hilbert space of pmfs might have 
many interesting applications. 

A possible application for the Hilbert space of probability mass functions might be analyzing 
the characteristics of a multivariate pmf. An important characteristic of a multivariate pmf is 
the conditional independence relations imposed by it. The conditional independence relation 
imposed by a multivariate pmf is determined by the factorization of the pmf to local functionj^ 
as explained in |[T8l[T9ll. 

* Local functions are functions (not necessarily pmfs) which have less arguments than the original multivariate 
pmf. 
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The factorization structure of a multivariate pmf into local functions also determines the al¬ 
gorithms which can perform inference on the pmf, in other words, maximize or marginalize 
the pmf. The sum-product algorithm, which is also called belief propagation, and the max- 
product algorithm effectively marginalize or maximize a multivariate pmf by exploiting the 
pmfs’ factorization structure 12. Modern decoding algorithms such as low-density parity- 
check decoding and turbo decoding, which have become highly popular in the last decade, 
relies on this fact. 

Some multivariate pmfs, for instance the pmf resulting from a hidden Markov model, has 
an apparent factorization structure. However, one cannot be sure whether this factorization 
structure is the “best” possible factorization or not. On the other hand, some pmfs, for in¬ 
stance the pmfs obtained empirically, might not have an apparent factorization structure at 
all. Therefore, developing a method which obtains the factorization of a multivariate pmf 
systematically would prove useful in many areas. 


1.2 Contributions 

The first contribution in this thesis is the derivation of the Hilbert space structure for pmfs. 
The Hilbert space of pmfs not only provides a vectorial representation of evidence but also it 
proves to be a useful tool in analyzing the pmfs. 

The second contribution of this thesis is a systematic method for obtaining factorization of a 
multivariate pmf. The resulting factorization is unique and is the ultimate factorization pos¬ 
sible. Hence, we call the resulting factorization as the canonical factorization. The canonical 
factorization of a multivariate pmf is obtained by projecting the pmf onto orthogonal basis 
pmfs of the Hilbert space of pmfs. Hence, this factorization method heavily relies on the 
Hilbert space of pmfs. 

The basis pmfs mentioned in the paragraph above are special pmfs such that their value is 
determined only by a linear combination of their arguments. In order to be able to talk about 
linear combinations of arguments addition and multiplication must be well defined between 
arguments of the pmf. Hence, the canonical factorization of a pmf can be obtained only if the 
pmf is a pmf of finite-field-valued random variables. This is an imporfant limitation of the 
canonical factorization. 
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The property of the basis pmfs mentioned in the previous paragraph causes an important limi¬ 
tation but also this property leads to the third and the probably the most important contribution 
of the thesis. Since the basis pmfs are functions of their arguments, the canonical factorization 
reveals the algebraic dependencies between the random variables. Thanks to this fact, it can 
be shown that channel decoders can be employed as an apparatus for tasks beyond decoding. 
This idea leads to new hardware options as well as new inference algorithms. 

The fourth contribution of the thesis is an application of the idea explained in the paragraph 
above. This contribution is a multiple-input multiple-ouput (MIMO) detection algorithm 
which employs the decoder of a tail biting convolutional code as a processing device. This 
algorithm is an approximate soft-input soft-output MIMO detection algorithm whose com¬ 
plexity is the square-root of that of the optimum MIMO detection algorithm. 

The final contribution of the thesis is another property of the canonical factorization. It can 
be shown that the conditional dependence relationships imposed by a multivariate pmf can 
be determined from the canonical factorization of the pmf. In other words, the conditional 
independence relationships imposed by a pmf can be determined by using the geometric tools 
provided by the Hilbert space of pmfs. This property of the canonical factorization might lead 
to applications in experimental fields such as bioinformatics dealing with large amounts of 
data. 


1.3 Comparison to earlier work 

A Hilbert space of probability density functions is first presented in literature in a very dif¬ 
ferent area of science, stochastic geology, in 0. Their derivation is for a class of continu¬ 
ous probability density functions. On the other hand our derivation is for pmfs. Although, 
the resulting Hilbert space structures in both their and our derivations are quite similar, our 
derivation is independent of theirs. Furthermore, we provide many applications of the Hilbert 
space of pmfs on probabilistic inference. 

The canonical factorization proposed in this thesis can be compared to the factorization of 
pmfs provided by the Hammersley-Clifford theorem |[T^[T9]| . Both the Hammersley-Clifford 
theorem and the canonical factorization can completely determine the conditional indepen¬ 
dence relationships imposed by a pmf. But Hammersley-Clifford theorem does not highlight 
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the algebraic dependence relationships between random variables while the canonical factor¬ 
ization does. Moreover, the canonical factorization is unique whereas the factorization of the 
Hammersley-Clifford theorem is not. 

The results obtained in this thesis can be located in the factor graph literature as follows. Fac¬ 
tor graphs are bipartite graphical models which represent the factorization of a pmf (Tl. The 
bipartite graphs were first employed by Tanner to describe low complexity codes in ll5l. A 
very crucial step in achieving the factor graph representation is the Ph.D. thesis of Wiberg 
l|6l|7l. In his thesis Wiberg showed the connection between various codes and decoding algo¬ 
rithms by introducing hidden state nodes to the graphs described by Tanner and characterized 
the message passing algorithms running on these graphs. Local constraints in @ are behav¬ 
ioral constraints, such as parity check constraints. The factor graphs are the generalization of 
the graphical models introduced in IQ by allowing local constraints to be arbitrary functions 
rather than behavioral constraints [H. 

The canonical factorization proposed in this thesis can also be represented by a factor graph. 
Moreover, the factor functions appearing in the canonical factorization can be transformed 
into usual parity check constraints by introducing some auxiliary variables. Therefore, the 
factor graph representing the canonical factorization can be transformed into a Tanner graph 
by introducing some auxiliary variable nodes which are very different from the hidden state 
nodes introduced in O. This is essentially an explanation of the claim that the channel de¬ 
coders can be employed for inference tasks beyond decoding. 


1.4 Outline 

After this chapter, the thesis continues with the introduction of the Hilbert space of pmfs in 
Chapter |2l The Hilbert space of pmfs is the main tool to be used throughout the thesis. The 
canonical factorization is introduced in Chapter [3l Chapter |4] investigates the properties and 
special cases of the canonical factorization. Chapter [5] explains how a channel decoder can 
be used for other probabilistic inference tasks other than its own purpose. This explanation 
is based on the canonical factorization. Some possible consequences of this result are also 
explained in Chapter[5] Chapter[6]provides some basic examples from communication theory 
on the use of channel decoders for other inference tasks beyond decoding. The MIMO detec- 
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tor which uses the decoder of a tail biting convolutional code is also introduced in this chapter. 
Chapter |7] shows that the conditional independence relations can be completely determined 
from the canonical factorization. The thesis is concluded with some possible future directions 
in Chapter [8] For the sake of neatness of the thesis some proofs and derivations are collected 
in the Appendix. 

1.5 Some remarks on notation 

Throughout the thesis we denote the deterministic variables with lowercase letters and random 
variables with uppercase letters. We represent functions of multiple variables as functions of 
vectors and denote vectors with boldface letters. Lowercase boldface letters denote determin¬ 
istic vectors and capital boldface letters denote random variables. All vectors encountered in 
the thesis are row vectors except a few cases in Chapter 0 

Matrices are also denoted with capital boldface letters which might lead to a confusion with 
random vectors. Throughout the thesis, we used V, W, X, Y, and Z to denote random vectors. 
All the other capital boldface letters are matrices. 

Unfortunately, many different types of additions are included in the thesis such as finite field 
addition, real number addition, vector addition, and even direct sum of subspaces. We reserve 
0 symbol for the direct sum of subspaces for the sake of consistency with the linear algebra 
literature. We use ffl symbol for the vectorial addition operation of pmfs which is defined in 
Chapter |2] We have to use the remaining -i- symbol for all the rest of addition operations such 
as real number addition, finite field addition, and vectorial addition in M^. Fortunately, the 
type of the addition employed can be determined from the types of the operands. 

A possible confusion might arise while using the summation symbol X. For instance, Pi{x) 
might refer to both p\{x) + P2{x) -i-... -i- PNix) and pi(x)ffl piix) ffl... ffl Pn{x) which are really 

two different summations. In order to avoid this confusion we denote the latter summation 

EB ^—1 ^ 

with 2_j j Pi{x), although summations like the former is never encountered in the thesis. 
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CHAPTER 2 


THE HILBERT SPACE OE PROBABILITY MASS EUNCTIONS 


2.1 Introduction 

The Hilbert space of probability mass functions (pmf), which is the main tool to be employed 
in the thesis, is introduced in this chapter. Throughout the thesis we are only interested in 
the pmfs of the finite-field-valued random variables. Therefore, we define what a finite-field¬ 
valued random variable is first in Section 12.21 We introduce the set of pmfs on which we 
construct the Hilbert space in Section 12.31 Then we construct the algebraic and geometric 
structures over this set in Section 12.41 and Section 12.51 respectively. Section 12.61 emphasizes 
the differences between the Hilbert space of random variables and the Hilbert space of pmfs 
in order to avoid possible confusion. Finally, in Section ITT] the idea of the construction of the 
Hilbert space is repeated on the set of multivariate pmfs. 

2.2 Finite-Field-Valued Random Variables 

Traditionally a random variable is a mapping from the event space to the real or complex 
fields. However, in some experiments, e.g., the experiments with discrete event spaces, it 
might be useful to map the outcomes of the experiment to a finite (Galois) field. Such a 
mapping would allow to carry out meaningful algebraic operations between the outcomes of 
different experiments, for instance as in |[32]|. A finite-field-valued random variable is defined 
below. 

Definition 1 Finite-field-valued random variable: Let Q be the event space of an experiment 
and Fg = GF(^) be the finite field of q elements. Moreover, let a function X : Q ^ ¥q be 
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defined as 


X{0J € £,-) = i V/ € F^, 

where {£,• : / € F^) are events (subsets ofQ.) of this experiment. The function X is called an 
¥q-valued random variable if the events {£, : i € F^} are mutually exclusive and collectively 
exhaustive, i.e., 


Si 7^ Sj Si n £y = 0 V/, j e F^, 

U 

ie¥g 

Actually, we do not need to restrict ourselves to the finite-field-valued random variables in 
fhis chapfer since fhe ideas presenfed in fhis chapfer can be applied fo any discrefe random 
variable. We need fhe concepf of finife-field-valued random variables sfarfing from fhe nexf 
chapfer. However, we infroduce fhe finife-field-valued random variables sfarfing from fhis 
chapfer in order fo make fhe represenfafion simpler. 


2.3 The Set of Strictly Positive Probability Mass Functions 

Many differenf experimenfs can be represenfed wifh an F^-valued random variable. All fhese 
experimenfs may lead fo differenf pmfs. Furthermore, we may have differenf pmfs even for 
fhe same experimenf if fhe oufcome is condifioned on some ofher evenf. Lef be fhe sef of 
all strictly positive pmfs fhaf an F^-valued random variable mighf possess, i.e., 

= \p{x) : F, ^ (0,1) c M s.f. 2] p{x) = 1 . (2.1) 

The Hilberf space of pmfs is going fo be consfrucfed on This sef excludes fhe pmfs which 
fake value zero for some values. The reason under fhis resfricfion will be clear affer scalar 
mulfiplicafion is defined on fhis sef. 

We are going fo represenf fhe pmfs wifh lowercase letfers such as p{x), r{x), or s{x). These 
pmfs may represenf fhe pmfs of random variables representing differenf experimenfs as well 
as fhey may represenf fhe pmfs of fhe same random variable condifioned on differenf evenfs. 
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2.3.1 The normalization operator 


We employ a normalization operator to obtain pmfs from strictly positive real-valued func¬ 
tions by scaling them. We denote this normalization operator with Cf, {■) and define it as 

Cf, {a{x)} : Tf, ^ ^ (2-2) 

where the set denotes the set of all functions from to M'*' and a(x) is a function in 'Fw^- 
An obvious property of the operator Cf, {■) that we exploit frequently is given below 

Cf, {jSa(x)) = Cf, {a{x)}, (2.3) 


where /3 is any positive number. 


2.4 The Algebraic Structure over ^f. 


The foundation of the Hilbert space of PMFs is the addition operation. Hence, the definition 
of the addition should be meaningful in the sense of probabilistic inference in order to take 
advantage of the Hilbert space structure for inference problems. 


The addition operation is inspired by the following scenario. Assume that we receive infor¬ 
mation about a uniformly distributed source X via two independent channels with outputs 
yi and y 2 as depicted in Figure ITT] Let p(x) = Pr{A = x|yi}, q{x) - Pr{A = x\y 2 ], and 
r{x) = Pr{A = x|yi,y 2 )- Since X is uniformly distributed, r{x) can be derived as 


r{x) 


p{x)q{x) 
ILxeV, Pix)q{x) 
Cf, {p{x)qix)} 


(2.4) 

(2.5) 


by employing the Bayes’ theorem. 

The PMFs p{x) and q{x) represent the evidence about the source X when only yi or y 2 is 
known respectively. On the other hand, r{x) represents the total evidence when both outputs 
are known. In a way, r(x) is obtained by summing p{x) and q{x). Hence, (12.41) can be adopted 
as the definition of addition. For any p{x) and q(x) in (Pf,, their addition is denoted by ffl and 


defined as 


p{x) ffl q{x) = Cf, {p(x)q(x)]. 


(2.6) 





Figure 2.1: The scenario for explaining the meaning of addition operation. 


The definition of the addition operation is such a critical point of this thesis that the rest of the 
thesis will be built upon this definition. 

This definition of addition operation is the same as parallel information combining operation 
as defined in ifT^ and message computation at variable nodes in the sum-product algorithm 

m. 

Defining fhe addifion operafion also enforces fhe scalar multiplication fo have such a form 
fhaf scalar mulfiplicafion is consisfenf wifh fhe addifion. The scalar mulfiplicafion, which is 
denofed by Kl, should satisfy fhe relafion below for posifive infegers n 

n Kl p{x) = p{x) ffl p{x) ffl ... ffl p{x) 

n times 

= Cf, |(pW)"1 . (2.7) 

Generalizing (12.71) fo any a in M leads fo fhe definifion of scalar mulfiplicafion below 


a K p{x) = Cf, [{p{xyf\ . (2.8) 

In order fo be able fo scale p{x) wifh negative coefficienfs if is necessary fhaf p{x) + 0 for any 
X in F^. Hence, in fhe definifion of (Pp, an open inferval is used rafher fhan a closed inferval 
in (12.11) . 

Theorem 2.1 The set !Pf, together with operations ffl and M forms a linear vector space over 
R. 


Proof. The closure of !Pf, under bofh operations is ensured by fhe normalizafion operators 
in fheir definifions. The commufafivify and associafivify are obvious from fhe definifion of ffl 
operafion. The neufral elemenf wifh respecf fo (w.r.f.) fhe addifion operafion is fhe uniform 
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distribution given by 

d(x) = 

q 

Consequently, the additive inverse of p{x), which is denoted by Hp{x), is 

^ -1 K p{x). 

The compatibility of scalar multiplication with the multiplication in M is obvious from (I2.8I) . 
The distributivity of multiplication over scalar and vector additions are direct consequences 
of the definitions of scalar multiplication and addition. Clearly, 1 is the identity element of 
scalar multiplication. Hence, becomes a linear vector space over M. ■ 

Example 2.1 The algebraic relations between some conditional pmfs is examined in this ex¬ 
ample in which a combined experiment is taking place in a two dimensional universe. 

First a fair die with three face^ is rolled. Then one of the three urns is selected corresponding 
to the outcome of the die rolling experiment. These three urns contain balls of six different 
colors. The number of balls of different colors in each urn is given in the table below. A ball 
is drawn from the selected um and replaced back a few times. 

Table 2.1: Number of balls in different colors in each urn mentioned in Example |2.11 


B p{x) ^ Cf„ 


p{x) 



Red (R) 

Yellow (Y) 

Orange (0) 

Blue (B) 

Green (G) 

Purple (P) 

Urn 1 

1 

9 

9 

3 

1 

1 

Urn 2 

9 

1 

9 

1 

3 

1 

Urn 3 

9 

9 

1 

1 

1 

3 


Let the event space of the die rolling experiment be mapped to a ¥ ^-valued random variable 
X such that the faces 1,2, and 3 are mapped to 0,1, and 2 in F 3 . Let six pmfs ofX conditioned 
on the color of the ball drawn be defined as follows when a single ball is drawn. 

r(x) = Pr{X - v| /? ) y{x) = Pr{X = x\Y} o{x) = Pr{X = x\ O } 

(2.9) 

b{x) = Ft{X ^x\B} g{x) = Pr{X = x\G} p{x) = Pt{X = x\P} 

For instance, assume that a ball is drawn from the selected um and replaced back six times 

and the colors of the balls drawn are B, B, G, G, G, and Y. Then the a posteriori pmfofX can 

' We can have a die with three faces in a two dimensional universe. This is the reason why the experiment 
takes place in a two dimensional universe. 
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be expressed by using the definitions of addition and scalar multiplication in as 


Pr{X = x\B, B, G, G,G,Y}^2M b{x) ffl 3 K g(x) ffl y(x). 


Now assume that the process of drawing a ball and replacing is repeated three times and the 
colors of the drawn balls are R, Y, and O. Then due to the symmetry in the problem the a 
posteriori pmfofX is 

Fr{X = x\R, Y 0} = 

Vectorial representation of this equation in is 

r{x) ffl y{x) ffl o(x) - 9{x). (2.10) 

Similarly, b{x), g(x), and p{x) are also related as 

b(x) ffl g(x) ffl p{x) = 9{x). (2.11) 


Now assume that the process of drawing a ball and replacing is repeated twice. The a poste¬ 
riori pmfofX given the colors of the balls are R and Y is 


and the a posteriori pmfofX given both balls are P is 

Pr{X - x\P, P} = 2M p{x) = 

Combining these last two results yields 

r{x) ffl y{x) =2M p{x). 

The following two relations can be obtained similarly. 


' 1/11 

, X = 0 


1/11 

, X = 1 

(2.12) 

9/11 

s 

, X = 2 


1/11 , 

X - 0 


1/11 , 

X - \ . 

(2.13) 

9/11 , 

X - 2 



(2.14) 


r{x) ffl o{x) = 2M g{x) 
o{x) ffl y{x) = 2M b{x) 


(2.15) 

(2.16) 


Actually, the algebraic relations f l2.iOD . f l2.iiD . f l2.74l ). f l2.75D . and d2.76D are all obtained 
by using only the basic tools of probability and the definitions of addition and scalar mul¬ 
tiplication in Pfj. We did not make use of the algebraic structure defined on Ppj to derive 
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these relations. Further algebraic relations between the conditional pmfs defined in d2.9l) can 
be obtained by using d2.70D . f l2.77l ). f l2.74l ). f l2.75l ). d2.76D ant/ exploiting the algebraic 

structure ofPf^. Some of these relations are given below. 


o(x) = -2 Kl p(x) y{x) - -IM g{x) r(x) = -2 Kl b(x) 

(2.17) 

p{x) = Kl o{x) g(x) = -^ Kl y(x) b(x) = Kl r(x) 

Example 2.2 Since it is proven that ci linear vector space we can talk about linear 

mappings (transformations) from to other linear vector spaces. In this example we are 
going to provide a familiar example for such a mapping. 

The log-likelihood ratio (LLR), which is defined for binary valued pmfs as 

A{K^)}^log^, (2.18) 

is a frequently employed tool in detection theory and channel decoding. For any e M and 
p{x), r{x) € 

Cf 2 [{p{x)r{r{x)f] I 

A {a Kl p{x) ffl /? Kl r{x)} = log-p— 

Cf, [ip{x)F{r{x))b] 

l.t=l 

(pjOWiriOf) 

(pWnriW 
= ah.{p{x)] + ISK{r{x)}. 

Hence, the LLR is a linear mapping from Ff 2 to M. 

2.5 The Geometric Structure over 

The geometric structure over a vector space is defined by means of an inner product. We are 
going to define an inner producf on by firsf mapping fhe vectors of Ff^ to and then 
borrowing the usual inner product (dot product) on M^. Such a mapping should posses the 
properties stated in the following lemma. 

Lemma 2.2 Let M {.) be a mapping from !Pf, to and a function cr(.,.) : Ff,^ x Ff,j —> 
be defined as 

cr{p{x), r{x)) =< M {p{x)}, M {r(.v)) >«? , (2.19) 

where < .,. >ir9 denotes the usual inner product on M^. o-{p{x),r{x)) is an inner product on 
Ff^ if A\{.} is linear and injective (one-to-one). 
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The proof of this lemma is given in Appendix lA.1.11 


We propose the following mapping from to and show later that it is linear and injective 


/ 

-C [p{x)} = ^ log p{i) - - ^ togp( 7 ) 


(eF„ 




e;, 


( 2 . 20 ) 


where e; is the canonical basis vector of R^ The proposal for X{.) is inspired by the 
meaning of angle between two pmfs. The details of arriving at the definition of X {•) is given 
in Appendix lA. 1.21 


Lemma 2.3 The mapping X{.) : ^ R^ defined in M. 201) is linear and injective. 

The proof is given Appendix IA.1.31 

It is a common practice to map pmfs to log-probability vectors in the turbo decoding and sum- 
product algorithm literature. The main difference befween fhose mappings and fhe mapping 
X {•} thaf we propose is fhe normalization (-| log p{j) ) in fhe definition of X {-). This 
normalization is necessary fo make fhe operator XM linear and consequenfly allows us fo 
borrow fhe inner producf on R^. In ofher words, if is fhis normalization which allows us fo 
consfrucf a geomefric sfrucfure on !Pf,- We believe fhaf omitting fhis normalization in fhe 
liferafure hindered discovering fhe geomefric relafions befween pmfs. 

Obviously, fhe mapping X{.) is nol fhe only mapping which safisfies fhe condifions imposed 
by Lemma|2j2l However, X {-) exhibifs a symmefric form. This symmefry leads us fo a useful 
geomefric sfrucfure on 

Theorem 2.4 The function <.,.>: !Pf, x R defined for any p{x), r{x) € as 

< p{x), r{x) >=< X {p{x)\ , X {r(x)) >]r? , (2.21) 

where X {■) is defined in d2.201) . is an inner product on Pw^- 

The proof direcfly follows from Lemma 1X2] and Lemma 1X31 

^ The canonical basis vectors of are usually enumerated with integers from 1 up to q. In this thesis we 
enumerate the canonical basis vectors of R'* with the elements of F,. Since there are q canonical basis vectors of 
R'? and q elements in there is not any problem in this enumeration. 


13 















The definition of the inner product on can be simplified as follows. 


< p{x), r{x) > 


< £{p{x)} ,£{r{x)} >R9 


\ / 

^ log piO - - ^ log p{j) log r(/) - - ^ log r{j) 


;eF„ 


fsF, 




( 2 . 22 ) 


^logp(/)logr(/) 

ie¥g 


1 


2 ] log p{i) 


^logr(/) 


(2.23) 


The equation above resembles the covariance of two random variables. Indeed, it is possible to 
express the definition of inner product in the form of a covariance of two real-valued random 
variables, which is shown in Appendix I A. 1.41 


The vector space evolves into an inner product space by the definition of the inner product 
in (12.211 ). Although we haven’t shown what dim^Pp^ is yet, we can conclude that Pf, is 
finite dimensional since there exist an injective mapping from to I^. It is well known 
from functional analysis theory that any finite dimensional inner product space is complete. 
Therefore, Pp, is a Hilbert space. 


2.5.1 The norm, distance, and angle on Pp 


The inner product on Pp induces the following norm on Pp 


||p(x)|| = V< p(jc),p(x) 


> 




’ 

2 (log p{i)f -- Yj 

ie¥„ ^ 1;€F„ 


(2.24) 

(2.25) 


A distance function between two pmfs can be obtained by combining this norm with the 
definition of subtraction in Pp as in 


D{p{x),r{x)) = \\p{x)Hr{x)\\ 




(£F„ 




r(i) I q 


P{i) 


r{i) 


(2.26) 

(2.27) 


Since ||.|| is a proper norm, this distance is a metric distance. In other words, it is nonnegative, 
symmetric, and it satisfies the triangle equality. 


We are going to show that dim 'Pf = ^ - 1 in Theorem l2.7l 
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Similar to any Hilbert space, the angle between any two pmfs p{x), r{x) in is given by 


Z{p{x), r{x)) = arccos 


< p{x),r{x) > 
\\p{x)\\ ||r(v)|| ■ 


( 2 . 28 ) 


2.5.2 The pseudo inverse of X{.} 

Lemma 2.5 For any p{x) in 

£{p{x)\F\, (2.29) 

where 1 denotes the all one vector in M^. 

The proof is given Appendix IA.1.51 

Since £{p{x)} is always orthogonal to 1 it is not a surjection (onto). Consequently, it is not 
a bijection (injection and surjection). A mapping which is not a bijection does not have an 
inverse. Nonetheless, a pseudo inverse for X {-) exists which satisfies 

V {£{p{x)}}{x) = p{x), 

where X'*' {-) (x) denotes the pseudo inverse of X {•). 

X^ {.) (x) is a mapping from to !Pf,- We propose the following definition for X^ {■) (x) 

X^ (p) (x) = Cf, jexp lip - s(x)||^j|, (2.30) 

where p is any vector in and s(x) is the vector-valued function from to given by 

s(x) ^ e, - -^1. (2.31) 

q 

The definition of X^ {-} (x) can be interpreted as in 

£+ (p) (;c) - Pr{X = x|s(A) -r N - p}, (2.32) 

where N is random vector whose components are all independent, real, zero-mean Gaussian 
random variables with unit variance. Furthermore, notice that the function s(x) maps the 
elements of to Md as in the simplex modulation. 

Lemma 2.6 X^ {-) (x) : ^ !Pf, defined in d2.30l) satisfies 

X"^ {X {p(x)}) (x) = p(x) (2.33) 
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for all p{x) in 'Pf^. Moreover, 


{p}Wl = P 


(2.34) 


ifV -L 1- 

The proof is given in Appendix lA. 1.61 

Theorem 2.7 is a q — \ dimensional Hilbert space, i.e.. 

dim'Pp^ - q - I (2.35) 


Proof. Due to the rank-nullity theorem in linear algebra 

dim!pF^ = dim im {X) + dim ker {X}, 

where im {X} and ker {X) denote the image and kernel (null space) of X {-) respectively. Since 
X{.} is shown to be an injection in Lemma 1231 ker{X) only contains 0. It can be deduced 
from Lemma [231 that the image (range space) of X{.) is a subset of I-*-, where I-*- is the 
subspace of R? given by 

1^4|P€M^:<P,1>k.= 0| (2.36) 

The second part of Lemma 12X1 improves this result as it clearly shows that the image of X {■) 
is exactly equal to f-*-. Therefore, 

dim!pF^ = dim -I- dim{0} 

= q-h (2.37) 

which completes the proof. ■ 


2.5.3 A set of orthonormal basis pmfs for Pf 


A set of g - 1 linearly independent vectors are necessary to form a basis for Pf^. An orthonor¬ 
mal basis for !Pf, can be obtained by finding a set of orthonormal vectors in f-*- and then by 
mapping these vectors to Pf via £f {.} (v). Let q - \ vectors in be defined as 


51 

52 


= r ^ 

^ V2 


^ 0 
V2 


A r J_ J_ 

V6 V6 


V6 


0 

0 


] 


(2.38) 


s,-i =[ 


sJq(q-\) sJq(q-\) sJq(q-\) 


q-\ 
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Clearly, all of these vectors are all in I-*- and they are all mutually orthonormal. q-\ pmfs in 
can be obtained by mapping these vectors to via {.} (x) as follows. 

Si{x) = {s,} (x) for / = 1,2,..., ^ - 1. (2.39) 


Due to the definition of the inner product and the second part of Lemma 1231 

< Si{x), Sj{x) > = < X . X {i'/v)} >]R? 

= < S,, Sj >Ri; 

I 1 , for / = 7 

I 0 ,for i j 

Therefore, {5i(v), S 2 ix ),..., 5^_i(v)) is an orthonormal basis for 


(2.40) 


Example 2.3 In Example 12. 1 1 basic algebraic relations between six pmfs, which are in 
is investigated. An orthonormal basis for P^^ is composed of two pmfs. 5i(v) and S 2 {x) given 
below forms such a basis for Pf^. 


5l(v) 


[[i 

1 

V2 



' 0.57598, 

X - 0 


~ 

0.14002, 

X - \ 



0.28400, 

X - 2 

S2{X) 


1 

. V6 

1 _ 

VS 



0.43595, 

.V = 0 



0.43595, 

X = 1 



0.12810, 

X = 2 




(2.41) 


(2.42) 


The coordinates of a pmfin Pf^, with respect to (w.r.t.) the basis {si(v), S 2 {x)} is simply the 
inner product of the pmfwith 5i(x) and S 2 (x). For instance, r{x) mentioned in Example \2.1\ 
can be expressed as 


r{x) - < r{x), .s'i(v) > Kl5i(v)ffl < r(x), S 2 (x) > IEl52(.^) 

-1.5537 51 (v) ffl -0.89701 M S 2 {x). (2.43) 

The coordinates of all the pmfs mentioned in Example 12.71 are given in the table below and 
depicted in Fisure \2?2\ 
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Table 2.2: Coordinates of the pmfs mentioned in Examples 12. ll and [2.3l 



r{x) 


o{x) 

b{x) 

g{x) 

p{x) 

5i(x) 

Slix) 

-1.55367 

-0.89701 

1.55367 

-0.89701 

0 

1.79403 

0.77684 

0.44851 

-0.77684 

0.44851 

0 

-0.89701 



Figure 2.2: Plot of the pmfs mentioned in Examples 12. II and [2.31 


2.6 Relation to the Hilbert space of random variables 

The Hilbert space of probability mass functions of finite field-valued random variables mighf 
be confused wifh fhe Hilbert space of random variables wifh a finite second order momenf 
which is already well known ||3TI . However, fhese fwo Hilbert spaces are quite differenl 
from each ofher. Firsf of all, fhe vecfors of fhe former Hilberf space are pmfs of fhe random 
variables whereas fhe vecfors of fhe laffer Hilbert space are fhe random variables themselves. 
Second, the former Hilbert space is related to the finite-field valued random variables whereas 
the latter is related to the complex-valued random variables. Finally, the former is meaningful 
in the Bayesian detection sense whereas the latter is not. 

Although this thesis is about the Hilbert space of the pmfs of finite field-valued random vari¬ 
ables, it is adequate to summarize the Hilbert space of random variables. The set of complex¬ 
valued random variables forms vector space with the usual random variable addition and 
scaling over C. This vector space can be endowed with the following inner product which is 
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nothing but the autocorrelation between two random variables. 


< X, y >= E [XY*], (2.44) 

where X and Y are two complex-valued random variables and E [.] denotes the expectation. 
The set of complex-valued random variables with finite second order moment is complete 
w.r.t. the norm induced by the inner product above. Therefore, this set forms a Hilbert space 
over C with the usual random variable addition, scaling, and the inner product given in (12.441) . 
Many important algorithms, such as the Wiener filter, relies upon the orthogonality in this 
Hilbert space. 

Notice that the Hilbert space structure over random variables is constructed over complex¬ 
valued random variables. Although it is also possible to construct a similar vector space over 
the set of finite field-valued random variables, fhe vector space of finite field-valued random 
variables does not have an inner product. In other words, the set of F^-valued random vari¬ 
ables forms a vector space with the usual random variable addition and scaling over F^. On 
the contrary to complex-valued random variable case, the expected value is not a well de¬ 
fined concepf for finife field-valued random variables. Consequenfly, aufocorrelafion befween 
two F^-valued random variables is not well defined either. Therefore, we cannot construct a 
Hilbert space structure over the set of F^-valued random variables as we could for the complex 
valued random variables. If we had a Hilbert space structure over the set of F^-valued random 
variables then we would have decoding algorithms for linear channel codes with polynomial 
complexity. 


2.6.1 Comparison between the convergence of random variables and pmfs 

Another possible confusion might arise between the convergence of finite field-valued random 
variables and the convergence of pmfs of finite field-valued random variables. As explained 
above expectation is not well defined for finite field-valued random variables. Therefore, con¬ 
vergence in fhe mean square sense is nof well defined for finite field-valued random variables 
either. On the other hand, convergence modes such as convergence almost everywhere and 
convergence in probability can still be well defined. However, due to the topological nature 
of the finite fields fhese fwo convergence modes are essentially equivalent. Convergence of a 
sequence of finite-field-valued random variables in probability is formally defined below. 
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Definition 2 Convergence of a sequence of finite-field-valued random variables in proba¬ 
bility: A sequence of ¥q-valued random variables, converges in probability to an 

¥q-valued random variable X if and only if for each e > 0 there exist an integer N such that 

n>N ^ Vr{Xn = X) > 1 - e (2.45) 

and this convergence is denoted by 


lim Ft{X„ =X} = 1. (2.46) 


Convergence of F^-valued random variables in probability, might be confused with the con¬ 
vergence of pmfs in . The following example aims to clarify the distinction between these 
two convergences. 


Example 2.4 Let the event space of an experiment Q.bef),Y\ c M and each outcome of the 
experiment is equally likely, i.e. 

Pr{m < c} - c, (2.47) 


where o denotes the outcome of the experiment. A sequence of¥ 2 -valued random variables, 
are assigned to this experiment as follows. 


jo, m € [0,1 - 2-”] 

\ 1 , (1 - 2 -”,!] 


(2.48) 


Clearly, the sequence converges in probability to a random variable X which is de¬ 

fined as 


I 0, m e [0,1] 

I 1, tu e 0 


(2.49) 


In other words. 


lim Pt{X„ - X) - 1. (2.50) 

n—^oo 


Let a sequence {pn(x)}‘f^^ of pmfs in be defined as 


Pn{x) = Pr{X„ = x} 


I 1-2-", x-0 

I 2-f X = 1 


(2.51) 
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and p{x) denote Pr{X = x}. Due to the basic axioms of probability 


1 ,x -0 


p{x) = 


(2.52) 


It might appear at a first glance that the sequence {pn(-^))^i converges to p{x). However, this 
would contradict with the completeness ofP^^ since p{x) ^ truth is is not 

a Cauchy sequence in This fact can be shown as follows. For any m > n > 0 




Since Pmi^) > P«(0) 



(2.53) 


Therefore, {Ph(jc))^j is not a Cauchy sequence and the limit lim„_^oo Pn{^) does not exist. This 
example demonstrates that convergence of a sequence of random variables in probability does 
not imply the convergence of their pmfs. 

2.7 The Hilbert space of multivariate pmfs 

The construction of the Hilbert space on can be applied to the set of multivariate (joint) 
pmfs as well. Basically, we should replace the indeterminate variable x in the Hilbert space 
of pmfs with a vector x while constructing the Hilbert space structure on multivariate pmfs. 

Let X = \X\,X 2 ,... ,X^] be a random vector where Xi is a F^-valued random variable. Fur¬ 
thermore, let denote the set of all strictly positive pmfs that X might posses, i.e. 



(2.54) 


The addition and scalar multiplication on F^n can be defined for any pi(x), P2(x), p(x) € Ff^ 
and a € M as 


pi(x) fflp 2 (x) = Cpw {pi(x)p 2 (x)} 

aMp{x) = Cjp^ |(p(x))") 


(2.55) 


(2.56) 
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The normalization operator in the multivariate case, which is denoted by Cjriv {.) above, maps 
any strictly positive function of F^, Qr(x), to a pmf in as follows. 

Cf" {«(x)) = (2-57) 

ZisF" ®(*) 

Similar to the univariate case, P^n together with the ffl and Kl operations forms a vector space 
over R. 


The analogue of the mapping X {-} in the multivariate case is denoted by Xw {■} and maps the 
pmfs in PfN to R^^^\ Before giving the definition of Xv {■) we need to establish a one-to-one 
matching between the vectors in and the canonical basis vectors of R^^^\ We can do this 
matching since F^ contains vectors which is equal to the dimension of R^^^\ Since the 
mapping Xv {■) is going to be employed in borrowing the inner product in R^^^^ the order of 
matching is not important. 


Using this matching Xv {■) is defined as 

Xiv {p(x)) : PfN R^^^ = 


ieF" 


jeF? 


Ci, 


(2.58) 


where Ci denotes the canonical basis vector of R^^^ matched to i € F^. Xw {■) is a linear and 
injective mapping as X{.). Then the inner product of any two p(x), r(x) € P^n becomes 

< p(x), r(x) > 4 < Xjv {p(x)), £n (Kx)) (2.59) 


^ 2]logp(i)logr(i)-^ 




2 ] logp(i) 


ieF^ 


^ log r{i) 


i€F^ 


(2.60) 


The definition of inner product makes P^n an inner product space. Since P^n is definitely 
finite dimensional it is also a Hilbert space. 


The pseudo inverse of Xv {■) is 

(p) (x) : ^ P^N = CfN jexp ||p - SA,(x)||^j|, (2.61) 

where S/v(x) is 

Sa?(x) 4 Cx - ^1 . (2.62) 

T 

The vector 1 above denotes the all one vector in Similar to the univariate case it can be 
shown that {-) (x) satisfies 

{£n {p(x))) (x) = p(x) Vp(x) e !Pf, (2.63) 

£n {X+ (p) (X)) = p Vp € 1^ c (2.64) 
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Consequently, 


im {X/vl - 1-^ c 


(2.65) 


Theorem 2.8 is a — I dimensional Hilbert space, i.e. 

dim'PfN = 


( 2 . 66 ) 


Proof. Due to the rank-nullity theorem in linear algebra 


dim 'P-gN 


dimker{Xw) -i-dimim (Xa?) 
q^-l. 


(2.67) 

( 2 . 68 ) 


As a minor consequence of this theorem we can conclude that P-^n is isomorphic to Pv^^ ■ 
This is a quite expected result since is isomorphic to F^iv. 
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CHAPTER 3 


THE CANONICAL EACTORIZATION OE 
MULTIVARIATE PROBABILITY MASS EUNCTIONS 


3.1 Introduction 

The factorization of a multivariate pmf is important in many aspects. For instance, the condi¬ 
tional dependence of the random variables distributed by a pmf can be determined by how the 
pmf factors. Existence of low complexity maximization and marginalization algorithms for a 
multivariate pmf, such as Viterbi and BCJR, also depends on the factorization of the pmf. A 
very special factorization of multivariate pmfs which we call as the canonical factorization is 
introduced in this chapter. 

This chapter begins with representing the factorization of a pmf in 'P-^n . Then we introduce 
the soft parity check constraints using which we decompose P^n into orthogonal subspaces. 
Finally, we obtain the canonical factorization of pmfs as the projection of pmfs onto these 
subspaces. 

3.2 Representing the factorization of pmfs 

The Hilbert space Pi^n provides a suitable environment for analyzing the factorization of 
multivariate pmfs. Suppose that a pmf in P-^n can be factored as 

K 

P(x) = n^‘W- (3.1) 

i=i 

Each (pi(x) function appearing above may be called a factor function, a local function, a con¬ 
straint, or an interaction. The factor functions are not necessarily pmfs but they can be as- 
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sumed to be positive. Hence, we can obtain a pmf in by scaling the factor functions as 
in 


r,(x) ^ C^N {(/>;(x)) (3.2) 

^ —0,(x), (3.3) 

Ji 

where y,- = ZieF'^ 0i(x)- After this normalization the factorization in (13.11) becomes 

K 

p{^) = ]~[ uniy) (3.4) 

!=1 

which can be represented using the addition in (Pp, as 

K 

P(x) = 

(=1 

This representation suggests that a multivariate pmf in P^n can be factored by expressing it 
as a linear combination of some basis vectors (pmfs) in P^n- If these basis pmfs are chosen 
to be orthogonal then we can employ the inner product on P^n to determine the expansion 
coefficients. However, the basis pmfs should be selected in such a way that the resulting 
factorization becomes useful. 

We know from the literature on the sum-product algorithm U |2l | 6 l |71 and Markov random 
fields ifTTl [TSl [191 l20l that the factorization of p{x) given in (13.11) is useful if the factor func¬ 
tions on the right hand side of (13.11) are local. A factor function of p(x) is said to be local if it 
depends on some but not all of the components of the argument vector x. Therefore, the basis 
functions mentioned in the paragraph above should also be selected to be as local as possible. 

3.3 The multivariate pmfs that can he expressed as a function of a linear 
combination of their arguments 

In this section we propose a special type of multivariate pmfs which will serve as basis vectors 
to obtain a factorization of pmfs in Pf^. We show in the next chapter that the factorization 
obtained using these basis pmfs is quite useful. These basis pmfs are inspired by the parity 
check relations in F^. Suppose that the components of an F^-valued random vector X = 
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\Xi,X 2 , ■ ■ ■ satisfy the following parity check relation 


aiXi + 02X2 + ... + ci]\jX]\j = 0 , 


(3.7) 


where a, is a constant in F^. If all configurations satisfying this relation are assumed to be 
equiprobable then the joint pmf of X, which is denoted by p(x), is 


Pix) = ) ^ 


aiXi = 0 

0 , otherwise 


(3.8) 


This pmf can be expressed in a more compact form as 


p(x) = 


nN- 


-d(ax^) 


C^N {d(ax^)}, 


(3.9) 

(3.10) 


where a is [ai,a 2 , ■ ■ ■, a^] and d(.) denotes the Kronecker delta. 


The multivariate pmfs which can be expressed in the form as in (13.101) are called parity check 
or zero-sum constraints. A parity check constraint depends only on the variables which have 
nonzero coefficients associated with them. Hence, they posses local function properties as 
we desire from a basis pmf. Therefore, parity check constraints could be good candidates 
for being basis pmfs if they were elements of . However, parity check constraints are not 
elements of P^n , since their value is zero for the configurations which do not satisfy the parity 
check relation. 


We can obtained a “softened” version of the parity check constraints as follows. Suppose that 
the components of the random vector X satisfy the following relation instead of (13.71) 


a\X\ + a 2 X 2 -I-... -I- cinXxi — U, 


(3.11) 


where U is an F^-valued random variable distributed with an r{u) € P^^. If all configurations 
resulting with the same value of U are assumed to be equiprobable then joint pmf of X in this 
case becomes 


P(x) 


^KO), UiXi ^ 0 

aiXi = 1 


■^riq - 1 ), 


UiXi = q - I 


(3.12) 
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which can be expressed in a more compact form as 


/^(x) ^ 

- Cpj/{r(ax^)). 


(3.13) 


(3.14) 


Definition 3 A multivariate pmf p{x) in "P^n is called a soft parity check (SPC) constraint if 
there exist a r{x) € and a vector a = [a^, ai,.. aA?-i] € such that 



(3.15) 


The vector a is called the parity check coefficient vector of the SPC constraint p{x). 

The difference between parity check and SPC constraint is the distribution of the weighted 
sum of the random variables Xq, Xi, ..., which is denoted by U in (13.111) . U is dis¬ 
tributed with d{u) in the parity check case whereas it is distributed with a r{u) in (Pp^ in the 
SPC constraint case. The term “soft” arises from the fact that the weighted sum can take all 
values with some probability rather than guaranteed to be zero. Therefore, unlike parity check 
constraints SPC constraints are in P^n, since all configurations have nonzero probabilities. 

Example 3.1 Let two pmfs in P^i are given with a slight abuse of notation as 


3 6 1 


P\{Xo,Xi) 


30 


6 1 3 


3 6 


12 30 1 

1 

P2(.^o,-*i) ^ 10 6 48 


12 8 30 


where pk{xQ = i, x\ - j) is given by the entry in the (/ -I- 1)'^’ row and the (j + column of 
the corresponding matrix. 


Notice that p\{xq, x\) can be expressed as 


pi{xo,xi) = -rixo + xi) 

^ Cf2{r{xo +xi)} 
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where r{x) € is 


0.3, 

X = 0 

0.6, 

X = 1 

0.1 

X = 1 


r{x) = 


Therefore, p\{xQ,x\) is an SPC constraint with parity check coefficient vector [1,1]. On the 
other hand, we cannot find a similar expression for P 2 {xq,x\). Hence, P 2 i.XQ,x\) is not an 
SPC constraint. 


Notice that we exploited the field structure of in the discussion above. Parity check re¬ 
lations could also be described in finite rings but the number of configurations satisfying a 
parity check relation depends on the parity check coefficients in a finite ring. Therefore, the 
SPC constraints in a finite ring would not be in a nice form as above. 

In the rest of this chapter we are going to show that SPC constraints form a complete set 
of orthogonal basis functions for !Pp«. The first step of this process is the following lemma 
which analyzes the inner product of two SPC constraints. 

Lemma 3.1 Inner product of two SPC constraints: Let pi(x),p 2 (x) € are two SPC 
constraints such that 

pi(x) = Cpy{n(ax^)} (3.16) 

P2(x) ^ CfN {r2(bx^)}, 

where rfx), r 2 {x) e !Pf,- If^ ond b are both nonzero vectors in F^ then 

i N-\ ^ rfx), r 2 (ax) >, 3a e F^ : b = rra 

" (3.17) 

0 , otherwise 

The proof of this lemma is given in Appendix IA.2. II 


3.4 Orthogonal Subspace Decomposition of 

Generating an SPC constraint in P^n based on a pmf in Pr^ and a parity check coefficient 
vector a can be viewed as a mapping from P^^ to P-^n parameterized on a as given below. 

<Sa {p{x)} : Pw, ^ P^s ^ CfN {p(ax^)) (3.18) 
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Any SPC constraint with parity check coefficient vector a is in im{5a}. The first of the fol¬ 
lowing pair of lemmas states that im {.Sal is a subspace of and the second one investigates 
the relation between two such subspaces. 

Lemma 3.2 For any nonzero parity check coefficient vector a in F^, im {.Sal is aq-\ dimen¬ 
sional sub space ofP^N. 

Lemma 3.3 For any two nonzero parity check coefficient vectors a, b e F^ 

3a€Vq-.a = a\) im {5a) = im {.Sbl (3.19) 

e F^ : a = ffb im |5a) J-im |5b) (3.20) 

The proofs of this lemmas are given in Appendix IA.2.21 and Appendix IA.2.3l respectivelv. 

Lemma 1331 suggests that can be decomposed into orthogonal subspaces by using a suffi¬ 

cient number of parity check coefficient vectors which are all pairwise linearly independent. 
Fortunately, we can borrow such a set of parity check coefficient vectors from coding theory 
as explained by the following theorem. 

Theorem 3.4 There exists a set FI of pairwise linearly independent parity check vectors in 
F^ of length N such that 

0im|5a) = :pF~. (3.21) 

ae'H 

where ^ denotes orthogonal direct summation. 

Proof. For all nonzero a, im {5a) is a subspace of P^n. Orthogonal direct sum of subspaces is 
again a subspace of F^n . Therefore, we can complete the proof by finding an FI which makes 

dim im {5a) = dim^Ppiv. (3.22) 

ae-H 

Let the elements of F( be selected by transposing the columns of the parity check matrix of 
the Hamming code in F^ with N rows. It is known from coding theory that the parity check 
matrix of such a Hamming code consists of columns all of which are pairwise linearly 
independent 1)111 . Therefore, Fi contains pairwise linearly independent vectors. Since 
these vectors are pairwise linearly independent, for any a, b e TF 

im{5a) -L im{5b) (3.23) 


29 








due to LemmaHence, 


dim im {5a) = ^ dim im {5a), (3.24) 

ae-H ae-H 

since these subspace are all orthogonal, im {5a) is a 1 dimensional subspace due to Lemma 
13.21 Therefore, 


dim^im{5a) ^ ^(^-1) 

aeTT ae'H 

= m\{q-y) 

- 

- dim'PrpiV, 


(3.25) 


which completes the proof. 


3.5 The Canonical Factorization 


Corollary 3.5 (The fundamental result of the thesis:) Any multivariate pmf in 'P^’^ can be 
expressed as a product of functions that depend on a linear combination of their arguments. 


Proof. Let “K be set of parity check vectors satisfying (13.211) . existence of which is guaranteed 
by Theorem l3.4[ Let the vectors in “K be enumerated as ai, a 2 ,..., a|.^|. Then any p{x) € P^n 
can be expressed as 

l-HI 

Mx) ^ ^^A(x) (3.26) 

i=l 

where pfx) is the projection of p{x) onto im{5a,). Since pfx) is in im{5aj, there exist an 
rfx) € Py^ such that 

Piix) - CfN {r,(a,x^)). (3.27) 


Then p{x) can be expressed as 

l-HI 

Pi'tO = ^ CpN {r,(aiX^)}. 

(=1 

Employing the definition of addition in P^n yields the desired factorization. 

i ] 

p(x) ^ n r 

, l-HI 

^ - PI r,(a,x^), 

^ (=1 


(3.28) 


(3.29) 

(3.30) 
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where y is equal to ZvisF" n(aiX^). 


Definition 4 The canonical factorization: A factorization of a multivariate pmf is called the 
canonical factorization of the pmf if all factor functions are SPC factors and parity check 
coefficient vectors of all SPC factors are pairwise linearly independent. 


The canonical factorization of a multivariate pmf in !Ppjv can be obtained by projecting the 
pmf onto the subspaces im |.Sa,) for a, e TY. In order to compute this projection a set of 
orthonormal basis pmfs for im |5a,) is required. We can derive such a set of orthonormal basis 
pmfs from the orthonormal basis pmfs for given in Section [2.5.31 by using the first part 
of Lemma l3A] The inner product of two SPC consttaints C^n |sy(a,x^)| and Cpw {si:(a,x^)| 
which are derived from sfx) and sjfx) defined in (12.391 ) is 


< CfN {^/a^x^)} ,Cfa' {^^(a/x^)} >= ^ < Sj(x), Sk{x) > 

due fo Lemma l3T] Consequenfly, 

< CfN [sjiaiX^)\,CfN {5^(a;x^)} 


q^-\ k = j 

0 


(3.31) 


(3.32) 


Therefore, fhe sef given below is a sef of orfhonormal basis pmfs for im |5a, |. 

KICpv{5i(a,-x^)},^“^ KICpw{52(a,-x^)},...,^“^ Kl Cpv {5^-i(a;x^)}| (3.33) 

Then fhe projecfion of pfs) onto im |5a,|, which is denoted by Cpjv |r,(a,x^)|, can be obfained 
as 

q-\ 

Cpv {r;(a,-x^)} ^ < Cpv { 5 /a,-x^)}, p(x) > KCpv {.sy(a,-x^)}. (3.34) 

7=1 

Moreover, due fo fhe linearify of fhe mapping .Sa, {.} 

9-1 
^ _ 

rfx) = ^ < Cp« { 5 y(a,x^)}, p(x) > IEl 5 y(.r). (3.35) 

7=1 


Example 3.2 Suppose that we are required to find the canonical factorization of p 2 {xo,xi) 
given in Example I3.il We can decompose !Pp 2 into orthogonal subspaces with a set PI con- 
taining - 4 pairwise linearly independent parity check vectors of length two. Such an PI 
can be selected as 

= {[1,0], [0,1], [1,1], [1,2]} (3.36) 
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The sub spaces ofPfj based on these parity check vectors are 




' 

p{xQ,xi) = ^r{xo) = ^ 

r(0) 

r(0) 

r(0) 


im|6:[i,0]| 

- . 

r(l) 

r(l) 

r(l) 

: r(x) e 




r{ 2 ) 

r(2) 

r(2) 

, 



' 

1 1 

p{xo,xi) = = - 

r{ 0 ) 

r(l) 

r(2) 

' 

im |6:[o,i]) 

= . 

r( 0 ) 

r(l) 

r(2) 

: r(x) e Pp^ 




r(0) 

r(l) 

r(2) 




' 

1 1 

p(xo,xi) ^ -r(xo + xi) ^ - 

r(0) 

r(l) 

r(2) 

' 


r(l) 

r(2) 

r(0) 

: r(x) € Pp^ 



r(2) 

r(l) 

r( 0 ) 



im|6^[i,2]| 


' 

1 1 

p(xQ,xx) ^ -r(xo + 2 xi) = - 

r(0) 

r(l) 

r(2) 

' 

r(2) 

r(0) 

r(l) 

: r(x) e Pp^ 


r(l) 

r(2) 

r(0) 



Let the projections ofp2{xo, x\)onto these subspaces be denoted with \r\{xQ), ^r2ix\), ^rT,{xo + x\), 
and ^^4(^:0 + 2 xi) respectively. Thesepmfs can be computed using di. j 5 D as 



0.2, X = 0 

II 

0 

n(x) = - 

0.4, X = 1 , r2(x) = ■ 

h ^=1 


0.4, x = 2 

3’ X = 2 


r 

( 


0 

4- 

II 

0 

0.3, x^Q 

X3(x) ^ ■ 

0.5, x = \ , r 4 {x) = < 

II 

0 


0.1, x = 2 

0.1, x^2 


Finally, it can be verified that 

1500 

P2{xQ,xi) = -^ri{xQ)r2{xi)r2,{xQ + xi)r/^{xQ + 2xi). 
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CHAPTER 4 


PROPERTIES AND SPECIAL CASES OE 
THE CANONICAL EACTORIZATION 


4.1 Introduction 

The canonical factorization deserves its name by possessing some important properties. This 
chapter explains these properties first and then some special cases of the canonical factoriza¬ 
tion is derived. These special cases will be important while applying the canonical factoriza¬ 
tion to communication theory problems in Chapter This chapter begins with introducing 
a matrix notation to represent local functions in Section |T2] Then it is shown in Section 1431 
that the canonical factorization is the ultimate factorization possible. The uniqueness of the 
canonical factorization is explained Section 14.41 The canonical factorization of pmfs with 
known alternative factorizations is derived in Section 1431 This chapter ends with deriving the 
canonical factorization of the joint pmf a random vector obtained by linear transformation of 
another random vector. 

4.2 Representation of local functions 

In the rest of the thesis we deal frequently with local functions. We adopt a matrix notation 
to indicate the variables that a factor function depends. We use F^-valued diagonal matrices 
such that some of their entries on the main diagonal are 1 and the rest are all 0. For instance, 

p{x) ^ p{xD) (4.1) 

indicates that the pmf p{x) depends on only to the components of x associated with a 1 on 
the diagonal of the matrix D. We call such matrices dependency matrices. Some special 
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dependency matrices we use in the thesis are E,-, I, and O. E, denotes the dependency matrix 
with a 1 only on the entry of its diagonal. The other two matrices are the identity matrix 
and the all-zeros matrix respectively. 

A local pmf is orthogonal to some SPC constraints as shown by the following lemma. This 
lemma is quite useful not only in this chapter but also in Chapter |7J 

Lemma 4.1 For any p{x) € P^n, any nonzero a e and any dependency matrix D 

p{x) = p(xD) A aD a p{x) J. im {5a}. (4.2) 

The proof is given in Appendix IA.3. II 

4.3 Ultimateness of the canonical factorization 

The ultimate goal of any mathematical factorization operation is to factor the mathematical 
object to its most basic building blocks. For instance, the goal of integer factorization is 
to express a natural number as a product of prime numbers. Similarly, the ultimate goal of 
polynomial factorization is to express a polynomial as a product of irreducible polynomials. 

In the case of factoring a strictly positive multivariate pmf into strictly positive factor func¬ 
tions, it is difficult to set an ultimate goal or to describe the most basic building blocks of 
multivariate pmfs. Since, any factor function in any factorization can still be expressed as 
a product of other positive factor functions, a multivariate pmf can be factored arbitrarily in 
many different ways and the factorization operation can continue indefinitely. In this aspect, 
factoring a strictly positive pmf is similar to trying to factor a real number. 

However, not every factorization is useful in practice. A factorization of a multivariate pmf 
is useful if it expresses the pmf as a product of local functions. Therefore, it is reasonable to 
continue to factor a multivariate pmf if any factor function can still be expressed as a product 
of more local factor functions. For instance, let a factor function (^(xD) of p(x) be expressed 
as 

0(xD) ^ ^i(xDi)^ 2 (xD 2 ) 

where D, D but DD,- = D,- for i = 1,2. Since and (pzi-) have less number of arguments 
than (p{.) has, the ultimate factorization of p{x) should contain the product 0 i(xDi) 02 (xD 2 ) 
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rather than 0(xD). In this point of view, the canonical factorization is the ultimate factorization 
that one can achieve as stated by the following theorem. 

Theorem 4.2 An SPC factor function with a nonzero norm cannot be factored further to 
functions having less number of arguments. 

Proof. Assume that an SPC constraint, C^n |r(ax^)|, with a nonzero norm can be factored to 
functions having less number of arguments. In other words, assume that Cpiv |r(ax^)| can be 
expressed as 

Cp^{r(ax^)} ^ 0 i(xDi)02(xD2) 

= Cp^ {0i(xDi)) ffl CpA'{(; 62 (xD 2 )) , (4.3) 

where Di and D 2 are such dependency matrices that aDi a and aD 2 7 ^ a. Cpw {0i(xDi)) and 
Cpiv {02 (xD 2)} are orthogonal to Cpw |r(ax^)| due to Lemma 1441 Then (14.31) is only possible 
if 

Cp~ {r(ax^)} ^ CpA- {0i(xDi)) = Cpw = 0{x), 

which is a contradiction completing the proof. ■ 

4.4 Uniqueness of the canonical factorization 

„A'_i ^ 

Recall that we need a set PI composed of pairwise linearly independent vectors in 
to derive the canonical factorization of a pmf in 'P^n. There are 2^ - 1 nonzero vectors in 
all of which are pairwise linearly independent. Hence, the set PI should contain all the 
nonzero vectors in F^. Consequently, the set PI required in the derivation of the canonical 
factorization of a pmf in (Ppw is unique. Moreover, the canonical factorization obtained from 
such a set PI is also unique. 

If the F^ is not the binary field then there are - 1 nonzero vectors in F^. Hence, we can 

„A'-l 

have more than one distinct sets which contain Pairwise linearly independent vectors in 
F^ if q is not equal to two. Let Pli = {ai, a 2 ,..., aM) and ‘K 2 = {bi, b 2 ,..., bM) be two 
distinct sets containing M = pairwise linearly independent vectors in F^. Using these 
two sets we can obtain two different canonical factorizations of a multivariate pmf p(x) e P^n 
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as in 


/’(x) 

/^Cx) 


M 


|n ’ 


(4.4) 

(4.5) 


V (=1 } 

where r,(a,x^) and ?/(b;x^) denote the projections of p{x) onto im|.SaJ and im|.SbJ respec¬ 
tively. Notice that any a, in 'Hi is definitely linearly dependent with one of the b, vectors in 
H 2 - In other words for any a, € Hi there exist a hj € H 2 such that 


by = era,-. (4.6) 

Consequently, im|.Sa, ) is equal to im|.Sby} due to Lemma [33] Therefore, the projection of 
p(x) onto these same subspaces should also be equal, i.e., 

ri(a;x^) ^ t/byX^), (4.7) 

which means that the factorizations in (14.41 ) and (14.5! ) are essentially the same factorization 
although they appear different. Since different sets of parity check coefficient vectors leads to 
the same canonical factorization, we can conclude that the canonical factorization of a given 
pmf is unique. Since the selection of the vectors in H does not affect the resulting canonical 
factorization, in the rest of the thesis we use H to denote any set containing pairwise 
linearly independent vectors in F^. 


4.5 The canonical factorization of pmfs with alternative factorizations 

In the most general case, the canonical factorization of a multivariate pmf in (Pp^v is composed 
of |‘K| SPC factors. However, for some special pmfs some of these |‘K| SPC factors are 
essentially constants. For these pmfs less than \H\ SPC factors may suffice to express the 
canonical factorization. 

The first group of these special types of pmfs consists of pmfs which depend on only a subset 
of their arguments. The canonical factorization of these types of pmfs is investigated in the 
following lemma. 

Lemma 4.3 Let D be a subset ofH defined for a dependency matrix D as 

2) = {a; e -K : a,D - a,)- (4.8) 
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The canonical factorization of a multivariate pmf p{x) € is in the form of 


if and only if 


p{x) = I ]~[ rfaiX^) j 
[a,e£) 


p{x) = p(xD). 


(4.9) 


(4.10) 


Proof. Due to Theorem 13.41 any pmf in P^n can be expressed as 

[~p| _ 

Pi'tt-) = ^ CfN {rfaiX^)] 

AiC'H 

® {T'(a,-x^)}, 


ffl. 


(4.11) 

(4.12) 


a, £2) 


aie^HW 


where Cpw |r,(a;x^)| is the projection of p{x) onto im {.Sa). But p{x) is orthogonal to im |.Sa, j 
for a, € “K \ D due to Lemma |4~T] Hence, 


for a, € “K \ D. Consequently, 


C^N {r,(a;x^)} = 0(x), 

[~p| _ 

P'(x) = ^ {r,(a,x^)} 

ajeV 

= I n 0<a,x^) 1, 

[a,eV j 


(4.13) 


(4.14) 

(4.15) 


which is the desired factorization to prove the theorem in the forward direction. 


The proof in the backward direction is straight forward. If p(x) can be factored as in (14.91) 
then 


p(xD) = Cpx I ]~[ r,(a/x^) i 

\a,eD j 

= Cpx|]~[ r,(a,DV)|. 
ia,eO j 


(4.16) 

(4.17) 


Since D is symmetric and a;D = a, for a, € D, 


p(xD) ^ fl T(a,x^) 

[a,60 j 


^ P(x), 


(4.18) 

(4.19) 
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which completes the proof. 


This lemma tells in practice that any pmf satisfying the relation /7(x) = p(xD) can be expressed 
as a product of \’D\ SPC factors rather than \T-l\ SPC factors. Moreover, parity check coefficient 
vectors of these SPC factors satisfy the relation a = aD. We do not need to compute the 
projection of p(x) onto im{5a) if a is not in D, since the result of that projection would be 
P(x) definitely. 

The next theorem investigates the canonical factorization of pmfs with known alternative 
factorizations. 


Theorem 4.4 If a multivariate pmf p{x) e can be factored as 


K 


p(x) = Cj,v j[^0y(xDy)^, 


(4.20) 


where Di, D 2 , ..., are dependency matrices then the canonical factorization of p{x) is in 
the form of 

( K j 

(4.21) 


p{x) ^ Cpv j ]~[ Y\ rfaiX^) f, 

,= 1 a,eOj 


where D: is the subset of'll given by 


Dj = {a,- € H : a;D/ = a, 


j - “o- 


(4.22) 


Proof. This theorem is actually a direct consequence of Lemma 1431 Let tfx) € be 


ty(x) = Cp«{0y(xDy)}. 


Since tfx) is equal to ty(xDy), 

due to Lemma 1431 Then p{x) is 

P(x) 


ffl V~1 rr 

M) = 2 j 


(4.23) 


(4.24) 


K 


4 


,/=l 

K 

fflvn E 


Z tu \ 1 '7' 

i=\ a, 60; 


K 


] n n f ’ 

;=1 a , 60 ; 


(4.25) 

(4.26) 

(4.27) 


38 


which completes the proof. 


The practical consequence of this theorem is that the canonical factorization of a pmf with an 
alternative factorization can be derived by obtaining the canonical factorization of the factor 
functions in the alternative factorization. This approach significantly simplifies fhe derivafion 
of fhe canonical factorization for such pmfs and extensively used in Chapter!^ 

4.6 The effect of reversible linear transformations on the canonical factoriza¬ 
tion 

If two random vectors are related with a reversible linear transformation then the canonical 
factorization of the pmf of the one of random vectors can be derived from the canonical fac¬ 
torization of the other random vector’s pmf. Let X be an -valued random vector distributed 
with p{x) € PfN. Moreover, let Y be another F^-valued random vector which is related to X 
as in 

Y = XB (4.28) 

where B is an reversible matrix in F^^^. Since B is reversible, for each y € F^ there is one 
and only one x € F^ vector satisfying y = xB, which is given by x = yB“^. Hence, 


Pr{Y-y) - Pr{X-yB-M 

(4.29) 

- p(yB'*). 

If the canonical factorization of p(x) is as given in 

(4.30) 

P(x) = Y\ T(a,x^) 

aje'H 

then the canonical factorization of Pr{Y = y) is simply 

(4.31) 

Pr{Y = y) = n r;(a,x^) 

i 1 x=yB“‘ 

a/G'K 

(4.32) 

aie'H 

(4.33) 


If B was not reversible then Pr{Y = yj would be zero for some y vectors in F^. Hence, 
Pr{Y = y) would not be a multivariate pmf in and consequently we could not talk about 
the canonical factorization of Pr{Y = y}. 
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An interesting question about the linear transformations of -valued random vectors might 
be whether there exists a linear transformation K for a random vector X such that the com¬ 
ponents of the vector Y = XK are statistically independent. If such a transformation exists it 
would prove useful in computing the marginal pmfs of the components of the random vector 
X. The canonical factorization of Pr{Y = y) given in (14.331) provides a clue to this question. 


Theorem 4.5 There exist a matrix K in an -valued random vector X such that 

the components of the random vector Y given by 


Y = XK (4.34) 

are statistically independent if the canonical factorization of the pmfofX. is composed of at 
most N SPC factors whose parity check coefficient vectors are all linearly independent. 


Proof. The proof is constructive. Let p(x) be the pmf of X and the canonical factorization of 
p(x) be denoted as 


Pix) ^ CfN 



(4.35) 


where TC is a subset of PI containing at most N linearly independent vectors. Let TCc be a 
subset of PI such that it is a superset of and it contains exactly N linearly independent 
vectors. Since r,(a,x^) is equal to 9{x) for a,- € TC \ 'A', the canonical factorization of p{x) can 
also be expressed as 

p{x) ^ Cjpw I Y\ n(a,x^) i (4.36) 

(a.eTCc j 

We may define a matrix whose rows are the elements of 7^. Using this matrix the 
canonical factorization of p{x) becomes 


p{x) = C^N |pj r,(y)(fyK,x^)|, (4.37) 

where fy is the canonical basis vector of F^ and i{j) is the index of the vector a,- when 
a; = fyKc. Then we may define fhe mafrix K as 


K ^ (K;‘)^. 


(4.38) 
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With this definition of K, the canonical factorization of Pr{Y = y) becomes 


N 


f[r,o)(fyK,x^) 

j=\ ] x=yK-i 

N \ 

(4.39) 

f]r,o)(fyK,(K-i)^y") 

N \ 

(4.40) 

A 1 

(4.41) 

1 

(4.42) 


. ./=1 

where yj is the component of y. Since Pr{Y = y} is separable, the components of Y are 
statistically independent. Moreover, the distribution of the component of Y is simply 


Pr{Yj ^y] = rnj)(y). 


(4.43) 


In the general case, the marginal pmfs of the components of an -valued random vector X 
can be computed via the marginalization sum whose complexity is q^. If the multivariate 
pmf of X obeys the condition imposed in Theorem 14.51 then X can be related to Y, whose 
components are statistically independent, as 

X = YK“'. (4.44) 

This means that any component of X is equal to a linear combination of N statistically inde¬ 
pendent random variables. Hence, the marginal pmfs of the components of X can be computed 
via N - I circular convolutions over instead of the marginalization sum. Consequently, the 
complexity of computing a single marginal pmf is Nq^ and the complexity of computing all 
marginal pmfs is N^q^ instead of q^ for such random vectori^ 


' These complexities can be reduced even more to Nq logj q and N^q log 2 q by computing the convolutions 
via FFT if is an extension field of the binary field Go). 
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CHAPTER 5 


EMPLOYING CHANNEL DECODERS EORINEERENCE 
TASKS BEYOND DECODING 


5.1 Introduction 

This chapter explains subjectively the most important consequence of the canonical factor¬ 
ization which allows the decoders of the linear error correction codes to be utilized in other 
inference tasks. 

This chapter starts with an overview of channel decoders. Then how a maximum likelihood 
(ML) decoder can be used to maximize a multivariate pmf is explained. It is shown in Section 
I5.4l that symbolwise decoders can be employed to marginalize multivariate pmfs. Section 1531 
highlights that the decoders of the dual Hamming code can be used as universal inference 
machines. Special cases are analyzed in Section 1531 The material presented in this chapter 
is summarized with graphical models in 15.71 This chapter ends with explaining the possible 
applications of employing channel decoders for inference tasks beyond decoding. 

5.2 An overview of channel decoders 

A channel decoder is specified by a code and a channel through which the coded symbols are 
transmitted. A code C over a finite field ¥q of length L is defined as a subset of F^. The code 
is called a linear code if C is a subspace of F^. For linear codes there exists a matrix H which 
satisfies 

Hx^-0 Vx€C. (5.1) 

The matrix H is called the parity check matrix of the code. 
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A channel is a system which maps a F^-valued symbol to an element of the output alphabet 
in a probabilistic manner . We assume that the channel decoders used in the rest of this 
chapter are designed for a specific channel. This channel relates the inputs to the outputs via 
the following relation 

Y,- - s(X,) + Z,-, (5.2) 


where Z, is a noise vector consisting of independent, zero-mean, real Gaussian random vari¬ 
ables with unit variance and s(.) denotes the simplex mapping as defined in (12.311) . The like¬ 
lihood function, which is a conditional probabilify densify funclion of a continuous random 
vector, of fhis channel is 


/y,iz,{Y,- ^ yi\Xi ^ Xi[ 


« -C {y,l (xi). 


(5.3) 

(5.4) 


The reasoning behind fhe selection of fhis channel model is explained in Secfion [5.6.2l 


Lef X = [Xi,X 2 ,..., Xi] denofe a codeword belonging fo fhe code C and Y = [Yi, Y 2 ,..., Y^] 
denofe fhe oufpuf of fhe channel when X is fransmiffed fhr'ough fhis channel. If all codewords 
are equally likely fhen fhe a posferiori probabilify (APP) of X is 


Pr{X = x|Y = y) - C^l jlc (x) f] /y,|x,{Y; - y,|X,- = x,)| 

= CFJic(x)f]xMy,}(^4, 


(5.5) 

(5.6) 


where x = [xi, X2, ..., xt\, y = [yi, y2, ■ ■ ■, yil, and Ic (•) denotes fhe indicator function i.e.. 


lc(x) = 


1, X€C 

0 , x^C 


(5.7) 


If fhe code C is a linear code wifh fhe parify check mafrix H consisting of M rows fhen 

M 


Ic (x) = Y\ ^(h;x^), 


(5.8) 


i=l 


where h, denofes fhe row of H. Consequenfly, fhe APP of X is 


t M L \ 

Pr{X = x|Y = y) - CpJ f] d(h;x^) f] {y,} (x^j • 


(5.9) 


* This definition of channel includes the modulator when necessary. 
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There are two decoding problems that can be associated with a code and the channel model 
defined above 1221 ■ The first one of these decoding problems is the codeword decoding prob¬ 
lem which is the task of inferring the transmitted codeword. This task is accomplished by 
finding the codeword which maximizes the APP Pr{X = x|Y = y}. Hence, this decoding is 
called the maximum a posteriori (MAP) codeword decoding. The MAP codeword decoding 
can be formally defined as 


Xmap = argmaxPr{X = x|Y ^ y). (5.10) 

xeC 

If all codewords are equally likely then the MAP codeword decoding problem is equal to 
the maximum likelihood (ML) codeword decoding problem which maximizes the likelihood 
function /y|x{Y = y|X = x) instead of the APP, i.e.. 


argmax/Y|x{Y = y|X ^ x) 
xeC 

(5.11) 

arg max Pr{X = x|Y = y) 
xeC 

(5.12) 

^MAP- 

(5.13) 


Both MAP and ML codeword decoding problems can be solved by the min-sum (max- 
product) algorithm, the most famous example of which is the Viterbi algorithm El 1^171 1331 . 

The second decoding problem is the symbolwise decoding problem which aims to produce a 
soft prediction about the individual coded symbols. This task is accomplished by marginaliz¬ 
ing the APP as in 

Pr{X,- = v,|Y = y) - 2 ] Pi-{X - x|Y - y) (5.14) 

~U:I 

where ~{v,) is the summary notation introduced in |[T]| and indicates that the summation runs 
over the variables xi, X2, ..., v,_i, v,+i, v,+ 2 , ..., x^. The symbolwise decoding problem 
is solved by the sum-product algorithm whose most famous example is the BCJR algorithm 

II21121. 

5.3 Maximizing a multivariate pmf by using an ML codeword decoder 

We begin this section with the following example. This example might be impractical but it is 
the simplest possible example to demonstrate the idea. We will generalize the idea after this 
example. 
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Example 5.1 Suppose that we are required to implement a device which finds the configu¬ 
ration maximizing a pmf p{x\,X 2 ) € 'P„ 2 . This device is supposed to return the pair {x\,X 2 ) 
which maximizes p{x\,X 2 ) after receiving the values p{Q, 0), p{Q, 1), p(l, 0), and p{l, 1) as in¬ 
put. Assume that while implementing this device we can use a handicapped processor which 
can only add two numbers, negate a number, and compute the logarithm of a number but can¬ 
not compare two numbers. Further assume that to compensate the handicap of the processor 
we are given the ML codeword decoder hardware of the linear code with the parity check 
matrix 

H - [ 1 1 1 ], (5.15) 

which is designed for the channel model described in Section \5?2\ 

If the processor at our hand was a regular processor which could compare two numbers then 
the solution of this problem would be obvious. Since this processor cannot compare two 
numbers, we need to figure out another solution by employing the ML codeword decoder. In 
this solution we should use the processor to compute the three input vecforj^ to be applied to 
the decoder from inputs applied to the whole system. 


'We sketch a solution as follows. Let the input vectors applied to the decoder be y\, y 2 , and 
y3- By d5.9l) this decoder will return the following Xml = Vx\,X 2 , %] vector 

3 


Xml = arg max S{xi + X 2 + 

[xi,X2,X3]eC 

Since x^, = x\ + X 2 for every codeword in C, 


X3) ]~[ {y;} (Xi). 

i=\ 


(5.16) 


Xml ^ arg max £+ {yil (xi)X+ {y 2 ) {x 2 )£'^ {y3) {xi + X 2 ). 

[xuX2,X3]eC 


(5.17) 


Due to Corollarv \3.5\ we know that any p{xi,X 2 ) € Ff 2 can be expressed as 


p{xi,X 2 ) ^ Cf 2 {ri(xi)r 2 (x 2 )r 3 (xi + X 2 )}. 


(5.18) 


Hence, if we apply y, = £ {rfx)) to the decoder then the decoder computes 


arg max ri{x\)r 2 {x 2 )r 3 {xi + X 2 ) 

[xi,X2,X3]eC 

(5.19) 

arg max p{x\,X 2 ). 

[xi,X2,X3]eC 

(5.20) 


The first two components of the Xml die result we are looking for. 


^ Recall that the channel model given in 15.21 maps each bit to a vector in 
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p{0,0) p{0,1) p(1,0) p(1,1) 



Result 


Figure 5.1: The block diagram of the solution to problem in Example 15.11 

The only missing component of the solution is computing y,- = X{r, (x)). These vectors can be 
derived using the discussion in Chapter\^as 

yi ^ [ logp(0,0) + logp(0, l)-logp(l,0)-logp(l,l) ][ 1 -1] 

yi - [ logp(0,0) - logp(0,1) + logp(l,0) - logp(l, 1) ][ 1 -1] (5.21) 

y3 ^ [ logp(0,0) - logp(0,1) - logp(l,0) + logp(l, 1) ][ 1 -1 ]■ 

Fortunately, our handicapped processor can be programmed to accomplish this subtask. The 
block diagram of the solution is depicted in Fisure UJ] 


An ML codeword decoder can be utilized to maximize a multivariate pmf t{x) if it can be 
expressed as a product of parity-check (zero-sum) constraints and degree one factors as in 

M L 

t(x) = Y] <5(h,x^) Y] fi(xi). (5.22) 

/=1 i=l 


The decoder which can maximize this pmf is the decoder of the linear code with the parity 
check matrix H given by 


H = 


hi 

h2 


(5.23) 


hvf 
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If y,' = X jCpw {^iC-^))} is applied as the input to the decoder then the ML codeword decoder 
performs the following maximization 

M L 


arg max ]~[ d(h,x^) ]~[ V {yd (x,) 

(5.24) 

M L 

arg max ]~[ d(h,x^) ]~[ 0,(x,) 

1=1 1=1 

(5.25) 

arg max t(x). 

(5.26) 


which is the desired maximization. 


Unfortunately, most of the pmfs cannot he factored as in (I5.22I) . Therefore, it might seem 
that utilization of an ML codeword decoder for maximizing a pmf has limited applicahility. 
However, for any pmf in we can find a substitute pmf which factors as in (15.221) and can 
he used to maximize the original pmf. Consequently, ML codeword decoders can he utilized 
in the maximization of a hroad range of pmfs. 


We derive such a substitute pmf based on the canonical factorization. Let p(x) e 'P^n be the 
multivariate which we want to maximize by using an ML codeword decoder. Due to Corollary 
13.51 /7(x) can be expressed as a product of SPC constraints as in 

L(a,x^)|, (5.27) 

where TY is {ai, a 2 ,..., Si\<H\\ and C-^n |r,(a,x^)| is the projection of p(x) onto im {.Sa,}. Recall 
that the set TY consists of pairwise linearly independent parity check vectors. Since all 
of these parity check coefficient vectors are pairwise linearly independent, N of them have to 
be of weight one. Without loss of generality we may assume that these weight one vectors are 
the first N parity check vectors in TY, i.e. ai, a 2 , ..., Then we may define a mafrix A by 
using fhe remaining parify check coefficienf vecfors in TY as 


l-HI 

p{x) = Cp J ]~[ 


1=1 


aw+i 


A 4 


aM+2 


(5.28) 


We will use A while defining fhe subsfifufe pmf for p(x). This subsfifufe pmf has an exfended 
argumenf vector X£ consisting of L componenfs where L is equal fo ITYI. This exfended 
argumenf vecfor is defined as 

xe = [x xa], (5.29) 
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where x and are given by 


A 


[xi X 2 ... Xn], 


(5.30) 


X 


A 


[Va?+i Xn+2 ■■■ 


(5.31) 


Finally, we propose the substitute pmf for p{x) as 


tpi^E) = 


p{x), if x^ - Ax^ 
0 , otherwise 


(5.32) 


Clearly, tp{xE) achieves its maximum value at a configuration xe^max which is equal to 


^EMAX = arg max tp{xE) 


(5.33) 



(5.34) 


where xmax is the configuration maximizing p{x). Due to this property any device which de¬ 
termines the configuration maximizing tp{xE) also determines the configuration maximizing 
p{x) at the same time. 

Now we need to show that tp{xE) can be maximized by an ML codeword decoder. As a 
first step, we can obtain an equivalent alternative definition of tp{xE) with using parity check 
constraints as 

L 

tpi^E) = P(x) Y\ - Xi). (5.35) 

i=N+l 

Inserting the canonical factorization of p{x) into the equation above yields 



(5.36) 



(5.37) 


Recall that we assumed the first N a,- vectors to be of weight one while defining the matrix A. 
Hence, we may safely assume further that these N a, vectors are the canonical basis vectors 
of F^. With this assumption the factorization of tpixp) becomes 



(5.38) 
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The only remaining step to obtain a factorization as in (15.221) is to replace ri(aiX^)6(aiX^ - Xi) 
with ri{xi)5{aiX^ - xi) which yields 


^{xe) = Qm ]~[ nixi) ]~[ ri{xi)6{aix^ - xi) I 
i=l i=N+\ ) 

= |n ri{xi) ]~[ d(a,x^ - x,)|. 

11=1 i=N+\ ] 


(5.39) 

(5.40) 


Since all the factor functions above are either degree one factor functions or parity-check 
constraints, an ML decoder of a linear code can be utilized to maximize tp(x£). 

The parity check matrix of the linear code which can be used to maximize tp{xE) and conse¬ 
quently p{x) at the same time can be found as follows. Let a parity check coefficient vector h, 
of length L be defined as in 


hr — [a,+M 0ix(,-l) - 1 Oix(L-r-W)] foT \ < i < L - N. 


(5.41) 


Equation (15.401) can be expressed using h,- as 


L-N 


^{xe) ^ Cp J ]~[ ri{xi) ]~[ d(h,x|) 


(5.42) 


t r=l r=l ) 

Hence, the parity check matrix H of the code whose ML codeword decoder can be used to 
maximize tp{xE) and p{x) is 


H 


hi 

h2 

hi-M 

= [A - I(L-W)x(L-Af)] ■ 

The L input vectors that should be applied to maximize p{x) and tp{xE) are 

yi = £{ri{x)\ for 1 < / < L. 


(5.43) 


(5.44) 


(5.45) 


To sum up, with these input vectors ML codeword decoder of the linear code with parity 
check matrix H returns 

L L-N 


^E,ML - 


arg niax ]~[ r,(x,) ]~[ d(h,x|) 


r=l 


r=l 


= arg max tn(x£). 


X£ 


(5.46) 

(5.47) 
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Figure 5.2: Summary of the utilization of an ML codeword decoder for maximizing a pmf 


Due to (15.341) the leading N components of xe^ml is the xmax vector maximizing p{x) which 
we are seeking for. We can ignore the rest of the xe,ml vector. The whole process of finding 
the configuration maximizing p(x) is summarized in Figure 15.21 

It is well known that ML codeword decoding problem is a special instance of maximization of 
multivariate pmf problems. In this section, we showed that there exists a special ML codeword 
decoding problem which can handle the maximization task of an arbitrary multivariate pmf. 
Hence, the reverse of the well known statement above is also true. Therefore, we can conclude 
that ML codeword decoding and maximization of multivariate pmfs are equivalent problems. 

5.4 Marginalizing a multivariate pmf by using a symbolwise decoder 

Let an F^-valued random vector X = \X\,X 2 ,... ,X;v] be distributed with a p{x) e (Ppw. The 
marginal pmf of X, is 

Pr{X,- - Xi) = Y, P(x)- (5.48) 

~F:I 

A symbolwise decoder can perform this marginalization if p{x) can be factored into degree 
one factor functions and parity-check constraints, which is not possible for a strictly positive 
pmf. However, as we did in the previous section, for each p{x) € we can obtain a substitute 
multivariate pmf which has the desired factorization and can be used in the marginalization 
of p{x). 
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We can follow a more straightforward path to obtain this substitute pmf when compared to the 
previous section. Inserting the canonical factorization of p(x) into the marginalization sum 
above yields 


Pr{^,- - Xi) 


~U, I V '=1 


-lx,I !=1 
N 


cf j ^ n n [ ’ 


l~|x,| 1=1 i=N+l 


(5.49) 

(5.50) 

(5.51) 


where we make the same assumptions as in the previous section about the canonical factor¬ 
ization of p(x). This equation shows that N of the factor functions are already of degree one. 
The remaining factor functions, which are SPC constraints, can be expressed by using the 
sifting property of the Kr'onecker delta function as 


■,(a,x^) = ^ d(a,x^ - x,)r;(x,) for N + I < i < L. 

Vx,-eF, 


(5.52) 


Since / is greater than N, x, above is not a component of vector x and is just a dummy variable. 
Using this identity in the marginalization sum gives 

N L 


Pr{X; ^ Xi) = Cf, -j ^ ]~[ niXi) ]~[ ^ d(a,X^ - X;)n(x;) j 

-|x,| 1=1 i=N-r\ Vx,€F, 


= Cf„ ■ 


Z Z fl ^i(Xi) J J d(a,'X X/) 

~Uil Vx. 4 €F^-'^ '=1 i=N+l 


(5.53) 


(5.54) 


where xa is as defined in (15.311) . Thanks to the summary notation the summation running over 
X .4 can be merged to the first summation which yields 

L L-N 


Pr{X, - Xi) - Cf, <1 J] n nixi) 6{h,xl) [, 
-|x,| 1=1 1=1 


(5.55) 


where xe and h,- are defined in (15.291 ) and (15.411) respectively. Notice fhaf (he fwo producfs 
above is (he facforizafion of tp{xE), which is defined in (15.321 ). given in (15.421) . Therefore, 


Pr{X; = Xi] 


Cw, 


'y , 

~U,I 


~U,I 


(5.56) 

(5.57) 
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This result shows that the marginal probability of X;, Pr{X, = Xi], can be computed either by 
marginalizing p(x) or by marginalizing tp{xE). 


Similar to the maximization of tp{xE), marginalization of tp{xE) can be accomplished by 
the symbolwise decoder of the linear code with parity check H defined in (I5.43I ). When 
input vectors y, = X{r,(v)} is applied to this symbolwise decoder it returns the marginal 
probabilities associated with the APP 


L-N 


Pr{Xi7 ^ XfilY = y} = C^l |P| Y\ |, (5.58) 

which is equal to tp{xE). Hence, this decoder is capable of both marginalizing tp{xE) and 
consequently p{x) at the same time. 


We could achieve the result given in (15.571) through a much shorter path if we started from the 
definition of tpCx^) given in (15.321) . We preferred the path followed above to this shorter path, 
since the path above explains how we reached to the proposed definition of tp{xE) which is 
the most critical part of the previous section. 


It is very well known that symbolwise decoding is an instance of marginalization problems in 
general. In this section we showed that marginalization of multivariate pmfs can be expressed 
as a particular symbolwise decoding problem. Hence, it can be concluded that symbolwise 
decoding and marginalization are equivalent problems. 


5.5 The decoder of the dual Hamming code as the universal inference machine 


In the previous two sections we have shown that the ML codeword and symbolwise decoders 
of the linear code with parity check matrix H can be used to maximize and mai'ginalize any 
pmf in ffN. This parity check matrix belongs to the dual code of a very well known code 
from coding theory. Recall that the matrix H is as defined as 


H = [A - I(L-Ar)x(t-V)] 

av+1 -1 0 ••• 0 

av+2 0 -1 ... 0 

: : : 0 

aL 0 0 ••• -1 


(5.59) 


(5.60) 
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The generator matrix of this code is 


G = [W A^] 


(5.61) 


In Section [531 we assumed that the first N a, vectors are the canonical basis vectors of 
Therefore, the generator matrix can be written as 


G = [a 



(5.62) 


„N-l 

Recall that all a,- vectors were pairwise linearly independent and L was equal to ^rr- There¬ 


fore, the generator matrix G given above is actually the parity check matrix of the Hamming 
code in of length L. Hence, the parity check matrix H given in (15.591) is the parity check 
matrix of the dual Hamming code in F^ of length L. Consequently, the ML codeword decoder 
of the (L, N) dual Hamming code can be configured by adjusfing ifs inpufs fo maximize any 
pmf in 'P^N. Similarly, fhe symbolwise decoder of fhe (L, N) dual Hamming code can be used 
as an apparafus fo marginalize any pmf in P^n. Therefore, fhe decoders of fhe dual Hamming 
codes are universal inference machines. 

5.6 Performing inference on special pmfs by decoders 

In the previous sections we have shown that the decoders of the (L, N) dual Hamming code 
designed for the channel model given in (15.21) can be used to perform inference on any pmf in 
PfN. The analysis presented in the previous sections is for the most general case. Decoders 
of shorter codes designed for simpler channel models can be employed to perform inference 
on some pmfs enjoying special properties in their canonical factorization. 

5.6.1 Performing inference with the decoders of shorter codes 

In Chapter|4]we investigated the canonical factorizations of some special pmfs. The canonical 
factorization of these special pmfs consisted of less than SPC factors. We can perform 
inference on these special pmfs by using the decoders of the codes whose parity check matri¬ 
ces are the sub-matrices of the (L, N) dual Hamming code. 

Suppose that we would like to perform inference on a special pmf p(x) € P-^n whose canonical 
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factorization can be expressed as 


P{X) = CfN 



(5.63) 


where 2) is a subset of and r,(a;X^) is the projection of p(x) onto im|.SaJ. Let S be a 
subset of 2 ) which consists of all of the parity check coefficient vectors in 2 ) of weight two 
or more. Moreover, let B be a |S| x matrix whose rows are the vectors in tB. Then we may 
define the substitute pmf tp{xp) which can be used to perform inference on p(x) as 


, . . A I if ^ 

— "I 

0 , otherwise 


where xg and Xf are given by 


Xfi — XN+2 ■ ■ ■ 2C\B\+N^^ 

Xp = [X Xa]. 


(5.64) 


(5.65) 

(5.66) 


It can be shown through a similar path to the one in Section [531 and Section [531 that we can 
maximize or marginalize tp{xp) if we wish to determine the configuration maximizing p{x) or 
marginalize p{x). Moreover, we can use the ML codeword and symbolwise decoders of the 
linear code with parity check matrix 

H 5 = [B -1] (5.67) 


to maximize or marginalize tp{xp). Hence, these decoders can be used to maximize or 
marginalize p{x). 

As in Section [531 and Section 1531 the ML codeword and symbolwise decoders of the linear 
code described by parity check matrix H 5 should be configured fo perform inference on p{x) 
by applying a cerfain sef of inpufs. The of fhese inpufs is X'*' {v,(x)} (x) where v;(b;x^) is 
fhe projection of p{x) onto im|.Sb, ), and b,- is fhe canonical basis vector of if / is less 
fhan or equal fo N and (/ - row of B ofherwise. 

Since 2) is a subsef of “K, B is a sub-mafrix of A defined in (15.281) . Consequenfly, H 5 is a 
sub-mafrix of H defined in (15.431) . Therefore, implementing fhe decoder associafed wifh H 5 
is easier fhan implemenfing fhe decoder associafed wifh H. 


Acfually, fhere are many linear codes whose decoders can be employed fo perform inference 
on tp{xp) and p{x) af fhe same fime. For insfance fhe decoders of fhe linear codes wifh parity 
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check matrices in the form given below can be used in performing inference on p(x), 


H 5 £ = [C - I], 


(5.68) 


where C is a sub-matrix of A such that it contains all rows of B and some more. The linear 
code with parity check matrix H 5 is the one with the shortest length among these codes. At a 
first glance preferring the decoder of a longer code to a shorter one might seem useless while 
solving the same inference problem. However, in the next chapter we are going to provide 
some examples in which choosing the decoder of the longer code might be advantageous. 

If a linear code has a parity check matrix which can be obtained by permuting the columns 
of H 5 S then the decoders of this code can also be employed in maximizing or marginalizing 
p{x). However, in order to obtain the desired result we need to apply permuted inputs. 


5.6.2 Performing inference by decoders designed for simpler channels 

Let d/ be the output alphabet of a communication channel which might be a finite set, real 
field, complex field, or a vector space. If fhere exists a sequence of channel outputs yi, y 2 , ..., 
y|-K| in d/ such that a multivariate pmf p{x) € can be expressed as 



(5.69) 


where Pr{T = y\X - x} denotes the likelihood function of the channel, then the decoder of the 
dual Hamming code designed for this channel can be employed to perform inference on p{x). 
The inputs that should be applied to this decoder to perform inference on p(x) are obviously 

yi,y2, 

In some problems preferring other channel models to the one described in (15.21) might be 
simpler. We selected the channel model therein since for each r{x) € there exists a y € 
such that r(x) = Cf^ {Pr{Y = y|X = v}}. Consequently, the decoders of the dual Hamming 
code designed for this channel can be employed to perform inference on any pmf p{x) € F^. 
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5.7 The Generic Factor Graph and Equivalent Tanner graph 


In Chapter [3l it is shown that the canonical factorization of any multivariate pmf p(x) in 
exists which is given by 

p(x) ^ Cpw jpi r,(a;x^)|. (5.70) 

In this chapter, we made some assumptions on a, . We assumed without loss of generality that 
the first N parity check coefficient vectors, ai, a 2 , ..., a/^, are the canonical basis vectors of 
F^. Therefore, the canonical factorization of any p{x) becomes 



(5.71) 


The factor graph representing this factorization is shown in Figure 15.31 a. This factor graph 
can represent any pmf in since all of the pmfs in P^n has a factorization given above. 
The only difference between any two factor graphs representing two different joint pmfs are 
the factor functions in the factor graph. 

In this chapter, we showed that performing inference on p{x) is equivalent to performing 
inference on tp{xE) which is defined in (15.321) . The factorization of tp{xE) given in 15.401 is 
represenfed by fhe Tanner graph shown in Figure I5.31 b.in Hence, fhis Tanner graph is fhe 
equivalenf Tanner graph representing fhe canonical faclorizalion. While Iransforming fhe 
faclor graph in Figure [531 a fo fhe Tanner graph in in Figure [531 -b. auxiliary variable nodes 
represenfing fhe variables xa^+i, xn+ 2 , ..., xl are added. These auxiliary variables are very 
differenl from fhe hidden sfafe nodes infroduced in fhe Wiberg sfyle Tanner graphs IH. 

5.8 Importance 

Using channel decoders for inference fasks beyond decoding is imporfanf mainly in fwo as- 
pecfs. Firsfly, using a channel decoder for an inference fask provides new hardware opfions in 
fhe solufion of fhe inference problems. Among fhese hardware options fhe analog probabilify 
propagation fechnique is imporfanf in particular llT4ll . Secondly, new approximate algorifhms 
for fhe solufion of fhe inference problems can be developed by using fhe sub-opfimal decoders 
of fhe codes, which have been sfudied for a long lime. 
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ri(Xi) r2(x2) rN(XN) 



Connections corresponding to 
the parity check equations of 
the systematic (L,L-N) duai Hamming code 


t t . t 

|’n+i(Xn+i) rfg+2(xN+2) ri_(Xi_) 


(a) 

ri(Xi) r2(x2) rN(XN) 



Figure 5.3: (a) The generic factor graph which can represent any p{x) in P^n. (b) The equiv¬ 
alent Tanner graph of the generic Tanner graph. 
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5.8.1 Performing inference with probability propagation in analog VLSI 


The semiconductor devices such as transistors and diodes are the most primitive building 
blocks of any electronic device today. By their very nature these devices are nonlinear. Over 
the last few decades engineers developed ways to cope with this nonlinearity. While designing 
analog circuitry engineers restrict the operation of the circuit to such a region in which these 
devices behave almost linearly. Another way to cope with nonlinearity of these devices is 
avoiding analog circuits as much as possible and trying to implement everything in digital. 
The signals flowing in a digital circuit are so large that transistors behave like switches. Hence, 
digital circuits are robust against the nonlinearity of the transistors. Digital circuits are also 
robust against other factors such as component mismatch and noise. Due to these and some 
other advantages digital circuits are usually preferred to analog circuits. 

However, Carver Mead, who is one of the pioneers of the VLSI revolution, claimed in his 
book Il34l that digital computation is inefficient and analog computation is the way to achieve 
the capacity and the efficiency of the brains of the animals. Moreover, he claimed that analog 
computation can be made as robust as digital computation to the factors such as noise and 
component mismatch. He provided many practical examples to support his claims in his 
book. 

A decade after Mead’s book, another evidence arise from coding theory to support his claims. 
Just two operations are sufficient to perform soft-input soft-output decoding. These operations 
are addition, which can be easily implemented with analog circuitry, and the hyperbolic tan¬ 
gent function (I]. Since the differential pair exhibits tangent hyperbolic function this second 
function can also be implemented with analog circuits. Motivated with this idea, Loeliger and 
his group designed and tested analog circuits to perform decoding of channel codes ifTSlfldll . 
They report that their analog decoding circuitry consumes two orders of magnitude less power 
than their digital counterparts. This efficiency arises from the fact that their analog decoding 
circuit does not fight with the nonlinearities of the transistors but exploits those nonlinearities 
ifTSll . They also report that these circuits are robust to component mismatch. 

Loeliger’s “probability propagation in analog VLSI” has an important limitation. This ap¬ 
proach can be applied to probabilistic inference problems if a condition related to the factor¬ 
ization of the multivariate pmf under concern is satisfied. This condition states that the pmf 
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should be able to be expressed as a product of zero-one valued functions and functions of 
degree one HU. Although this condition is satisfied in decoding problems, it is not satisfied 
in ofher problems arising in communicafion fheory such as channel equalization and MIMO 
detection. Hence, equalizers or MIMO detectors could not be built directly with their brilliant 
idea whereas decoders could. A pure decoder implemented with probability propagation in 
analog is not very useful without implementing the equalizer or detector in analog since the 
interface required between the decoder and the equalizer (or detector) would cancel all the 
efficiency of fhe analog decoder. 

In this chapter, we showed that inference problems can be solved by using channel decoders. 
Hence, it is possible to solve the equalization or MIMO detection problems by decoders. 
Consequently, the results presented in this chapter, allows us to implement channel equalizers 
or MIMO detectors with the very efficient analog probability propagation approach. It is 
reasonable to expect, based on the experience on analog decoding, that such receiver blocks 
would be two orders of magnitude smaller in size and consumes two orders of magnitude less 
power than current receivers. Probably this aspect will be the most important contribution of 
this thesis. 

5.8.2 New approximate inference algorithms 

The iterative sum-product algorithm running on loopy Tanner graphs is proven to be efficient 
decoding algorithm for various codes. The sum-product algorithm is characterized by the 
Tanner graph representing the code. A code might be represented with many different parity 
check matrices. For each parity check matrix, more than one Tanner graphs might be obtained 
representing the code. Hence, for each code we have various alternative Tanner graphs to 
represent the code. Consequently, we may have various versions of the sum-product algorithm 
to decode the same code. Each of these alternative versions have different characteristics in 
terms of complexity and performance Q. Therefore, employing a channel decoder to perform 
an inference task allows us to choose among different sum-product algorithm versions to 
handle the inference task. Hence, new approximate inference algorithms can be developed in 
this manner. We provide an example on MIMO detection in the next chapter. 
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CHAPTER 6 


USING CHANNEL DECODERS AS DETECTORS 


6.1 Introduction 


This chapter contains examples to the idea presented in Chapter [5]by showing how to employ 
channel decoders as the detectors of communication receivers. One of these examples which 
is MIMO detection by using the decoder of a tail biting convolutional code demonstrates 
that new inference algorithms with low complexity can be developed by employing channel 
decoders for other purposes. 

Unfortunately, some of the derivations presented in this chapter might appear quite tedious, 
Sections [63116.41 and l6.5l in particular. Actually, the derivations in these sections are straight¬ 
forward applications of the methods presented in the previous chapter. Most of these deriva¬ 
tions are so straight forward that they can be derived with symbolic programming. Indeed, we 
used the GiNaC symbolic programming library in C-t-i- while deriving some of the cumber¬ 
some derivations presented in this chapter. Hence, reporting and following these derivations 
is much more difficult than deriving them. However, these sections include examples to make 
the subject more concrete. These examples also demonstrate how the same decoder can be 
used for different purposes by changing its inputs. 

This chapter begins with analyzing the multiple-input single-output (MISO) detection. Then 
the results obtained in Section l63] are used to derive the channel decoder which can be used in 
the detection of naturally mapped pulse amplitude modulation (PAM) signals in Section 16.31 
Section [631 explains the detection of gray mapped PAM signals by using channel decoders. 
Section [63] investigates the multiple-input multiple-output (MIMO) detection of QPSK signal 
by using decoders. Special attention is paid to the MIMO detection of QPSK signals by using 
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the decoders of tail biting convolutional codes in Section 16.61 This section also includes 
some simulation results. This chapter ends with briefly reporting that the Viterbi and BCJR 
decoders of the convolutional codes can be used channel equalizers. 

6.2 MISO detection of q'-ary PSK signaling with prime q by using a channel 
decoder 

The MISO detection of ^-ary PSK signaling under additive Gaussian noise is the simplest 
task (in terms of derivation) to be handled by a decoder. Moreover, analyzing this case first 
helps to transform other detection problems to decoding problems. ML MISO detection task 
is finding the most likely input sequence given the received symbol. This task can be handled 
by ML codeword decoders. Soft output MISO detection is the computation of marginal a 
posteriori probabilities. This task can be handled by symbolwise decoders. 

6.2.1 Signal Model 

Let fiq {x) be a function from to C representing the ^-ary PSkQ mapping, i.e. 

A / 2n- \ 

Hg (x) = exp 17 —mt(x) I, (6.1) 

where int(.) denotes the usual mapping from ¥g to N. Let a complex-valued random variable 
Y be related to an F^-valued random vector X = [Xi,X 2 ,..., as 

N 

+ Z (6.2) 

(=1 

where /r, is a complex constant and Z is a zero mean circularly symmetric complex Gaussian 
noise with E [ZZ*] = 20"^. Clearly, Y models the received symbol after the symbols Xi, X 2 , 
..., are modulated with ^-ary PSK and passed through a 1 x X multi-input single output 
(MISO) channel with channel coefficients /i,. With these assumptions the a posteriori pmf X 
is 

Pr{X - x|T = y) - C^n 

where x = [xi, X 2 ,..., x^]- We assume perfect channel information is known at the receiver 
side. 

’ 7-ary PSK is not the same as QPSK. 


exp 


|y - hi^lq (x,)| 
20-2 


2\1 


(6.3) 
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6.2.2 The canonical factorization of the joint a posteriori pmf 


The first step in determining the linear code whose ML codeword (symbolwise) decoder can 
be used to maximize (marginalize) the a posteriori pmf Pr{X = x|F = y) is obtaining the 
canonical factorization of the a posteriori pmf of X. The generic procedure of obtaining this 
canonical factorization is explained in detail in Chapter[3l which could have been prohibitively 
tedious for this problem. Fortunately, the joint a posteriori pmf in this problem, Pr{X = x|F = 
y}, enjoys many special properties so that deriving its canonical factorization is easier. 

Let p(x) denote Pr{X = x|F = y}. As shown in Appendix I A.4. 11 p{x) can be factored as in 


Since we have a known factorization for p(x), we can apply Theorem l4.4l to obtain the canon¬ 
ical factorization of p(x) as explained below. 


N 


pix) = i ]~[ exp 


i=l 


2Re {yh*pg (x; 
20-2 


N j-1 

nriexp 

2 Re IhfjPg (xi) pq (xy)*} "j j 

1 11 1 exp 

J=2 1=1 

2<r2 j| 


Let two functions y(a>;p, cr) and d{a>i ,C02\X^ cr) be defined as in 

'iR&lppq (m)**'' 

y(m;p,cr) = exp 


e{coi, cap,x^cr) = exp 


20-2 

lRe\xPq {C0l)pq ((02)' 

2cr2 




(6.5) 

( 6 . 6 ) 


V 

for (o, (Oi, (02 in ¥q, cr in M, and p,;y in C. The function y(a>; p, cr) is nothing but the likelihood 
function of (o when it is modulated with i^-ary PSK, passed through an additive white Gaussian 
noise (AWGN) channel with power spectral density (PSD) No/2 = cr^, and given that the 
value at the output of the matched filter is p. Using these functions the factorization of p(x) 
becomes 


p(x) ^ 


N N j-t 

Y\yixf,yh*,cr)Y\Y\ 6(xi, Xj\ hjh*, cr) 


(6.7) 


1=1 


J=2 1=1 


Notice that the factorization of p(x) given above is composed of degree one and degree two 
factors only ^ Therefore, the canonical factorization of p(x) should be composed of SPG 
factors of degree one and two due to Theorem l4.4l The SPG factors of degree one composing 
p(x) are simply the normalizations of y(x;;y/i*, cr)’s. 


The SPG factors of degree two composing p(x) can be derived by obtaining the canonical 

factorization of 6(xi, xf, hih*, cr). The straightforward way of deriving the canonical factoriza- 

J 

^ We regard p, x and tr as parameters of functions ■/(.;.) and 0 (.;.), not their arguments. 
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tion of 6(xi, xf, hill*, cr) might be projecting this function onto the subspaces im for 

all nonzero a € F^, where f, is the canonical basis vector of F^. However, the required 
canonical factorization can be obtained in a simpler way by exploiting the fact that q is as¬ 
sumed to be a prime number in this section. Since g is a prime number, F^ is a prime field. 
Consequently, the subtraction in F^ is the subtraction modulo q. Due to this fact. 


Therefore, 


(m 2 )* ^ fiq (mi - m 2 ). 

2Re (mi - m 2 )} 

e{a)i,a)f,x,o-) = exp- — - 

\ 

^ 7(mi - m2;-;y,o-). 


Inserting this result into (16.71) yields. 


N N j-l 

pix) = CfN -} Y\ 7ixi\yh*, O') ]~[ ]~[ yixi - Xf, -hih*, cr) j 

j=2 i=\ 


i=l 


We can define pmfs in by scaling y{x',yh*,cr) and 7 (x; -hih*, cr) as in 


where is 


( 6 . 8 ) 

(6.9) 

( 6 . 10 ) 

( 6 . 11 ) 


ri{x) 

- Cf, {r(^;y/l^c^)|, 

(6.12) 

rujix) 

= Cf, {r(-^;-/r,7r*,cr)|. 

(6.13) 

be expressed by using these pmfs as 


- 

f N N j-l ) 

1 n n n[ 

[,= 1 j=2 i=l j 

(6.14) 

II 

( N N J-l ] 

r,(f;x^) Y\ . 

[ !=1 j=2 i=l j 

(6.15) 


aij = ti - (j. 

(6.16) 


Notice fhaf all fhe factor functions in facforizafion above are SPC factors. Moreover, fhe parify 
check coefficient vectors of all SPC factors are pairwise linearly independent. Hence, due to 
Definition HI the factorization of p(x) given in (16.151 ) is the canonical factorization of p(x). 


6.2.3 The decoders which are ahle to perform inference on the joint a posteriori pmf 

As explained in Section [53] the ML codeword and symbolwise decoders of the dual Hamming 

a"-! 

code of length perform inference on p(x). However, since the canonical factorization 
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of p{x) given in (16.151) consists of less than SPC factors, the ML codeword or symbol- 
wise decoders of a shorter code can be employed for maximizing and marginalizing p(x) as 
discussed in Section 1531 Following the discussion in Section 15^ the parity check matrix of 
this code whose decoder can be employed in the demodulation of 1 x MISO system is 


ai,2 

ai,3 

^2,3 


'^qPSKiN) = 


^2,N 


— In(N-I) 

2^2 


(6.17) 




For a neater representation of H^psKiN), we define a matrix parameterized on i and N K(/, N) 
as 

K(/,A^) = [I,x,- - lixl Orx(A?-r-l)] ■ (6.18) 


Then HqpSKi^) can be expressed as 


^qPSK{N) = 


K(l,iV) 

K(2,A^) 


2^2 


K(A^,A^) 


(6.19) 


The complete specification of a decoder of a linear code consists of a parity check matrix and a 
channel model. The parity check matrix of the decoders which can detect received symbols of 
1 X MISO system are explained above. As the channel model we can use the one described 
in (15.21) . However, we can use a more natural channel model in this case as explained in 
Section [5.6.21 Recall that the factorization of p(x) given in (16.11!) is composed of likelihood 
functions of the channel which first modulates an F^-valued symbol with ^-ary PSK and then 
passes through an AWGN channel with PSD No/2 = cr^. Therefore, the received symbols of 
1 X A MISO system can be detected with the decoders of the code with parity check matrix 
^qPSK which is designed for q-wy PSK modulation and AWGN channel with variance cr^. 
In order to achieve the desired detection inputs that should be applied to these decoders are 
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components of the vector given below. 


\yh\ yhl ... yh% - - h,h\ - h 2 h; ... - hih% - h^h*^ ... - hN-ih^j] 

We can also use a modification of the same decoder which is designed for standard noise with 


(T^ - 1. In this case all of the inputs given above should be scaled by -. 


Example 6.1 This example demonstrates how can we employ a symbolwise decoder to com¬ 
pute the marginal APPs in a \x4 MISO system. Let a complex-valued random variable Y be 
given as 

4 

Y ^Y,^iPgiXd + Z (6.20) 

1=1 

where X,- is an Vq-valued random variable and Z is the circularly symmetric Gaussian noise 
with E [ZZ*] = 2. Our aim is to compute Pr{X,- = v,|F = y) by using a symbolwise decoder, 
explained above the parity check matrix of this decoder is 

K(l,4) 

- I6x6 




K(2,4) 

K(3,4) 

1-10 0-10 0 
■1 0 
■1 0 


( 6 . 21 ) 


1 0 
0 1 


0 


-1 0 


0 0 
0 0 


0 

0 


0 0 


10 0 0 


10 0 -1 0 0 0 -1 0 0 


0 1 
0 0 


0 

1 


-1 0 

-1 0 


0 0 0 


-1 0 


0 0 0 0 


-1 


( 6 . 22 ) 


The input vector that should be applied to this decoder is 

[y/i* y/i* yhl yhl -/ii/i* - hih^ -/12/1; - hhl - - hhH 

Notice that configuring the demodulator for a new observation and new set of channel coeffi¬ 
cients requires only changing the inputs to the decoder. This example is illustrated in Figure 

O 


6.3 Channel decoders as detectors of naturally mapped M-PAM 

In this section we show how to demodulate the naturally mapped M-PAM modulation by using 
a channel decoder. Let 77 /v (x) be a function from to M which maps binary valued vectors 
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(a) 

yhi* yh/ yh/ yh^ -hih/-hih/-h2h3*-hih/-h2h4*-hgh; 
Nj/\U\b\UNj/\U\b\U\b\U 



Marginal APPs of 
Xi,X2, X3, and X4 


Ignore 


(b) 


Figure 6.1: 1 x4 MISO system, (a) The system model, (b) Demodulating the received symbol 
by using a symbolwise decoder. 
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of length N to M = 2^ real amplitude values as in the naturally mapped PAM modulation, 
i.e. 

N 

77^(x)^2]2'-';SU), (6.23) 

1=1 

where x = {x\,X 2 ,..., x^^ and/? (x) denotes binary antipodal mapping given in 

. f 1, X = t) 

/3(^) = . (6.24) 

[ -1, X - 1 

Assume that 77 ^? (X) is transmitted through a discrete additive Gaussian noise channel and Y 
is received. In other words, 

Y = j]n(X) + Z, (6.25) 

where X = \Xi,X 2 , ■ ■ ■ ,Xi^^ and Z is a real Gaussian random variable with variance cr^. 
Inserting the definition of 77*1 (X) into (16.251) yields 

N 

Y = Y,^'-^P{Xi) + Z. (6.26) 

1=1 

Since /? (X,) is equal to (X,) for g = 2 and 2 is a prime number, (16.261) is a special case of 
(16.21) . Consequently, naturally mapped M-PAM detection is a special case of MISO detection 
of binary phase shift keying (BPSK) with channel coefficients /i, = 2'“^ Hence, the parity 
check matrix of the code whose decoder can demodulate M-PAM is H 2 P 5 A'(log 2 M). Since 
-1 is equal to 1 in the binary field, all of fhe minus ones in H 2 P 5 /r(log 2 M) can be replaced 
with ones. The input vector that should applied to the decoder in order achieve demodulation 
of M-PAM is 


y 2y 

2 N-ly 

2°2i 

2^22 

2'22 

2 ^ 2 ^ 

2123 

2^23 

cr cr 

cr 

cr 

(T 

a 

2O2N-1 

cr 

2 I 2 /V-I 

cr 

cr 

2N-22N-1 


(T cr cr 


where y denotes the received value. 

Implementing an ML M-PAM detector by using the ML codeword decoder of the code with 
parity check matrix H 2 P 5 A:(log 2 M) might not be practical since there are simpler ways to 
implement such a detector. However, implementing a soft output M-PAM detector by using 
the symbolwise decoder of the same code might be of practical importance. 

Example 6.2 This example shows how to compute marginal APPs of four bits which are 
modulated with naturally mapped 16-PAM and passed through an AWGN channel with PSD 
Nq/ 2 = cr^. Constellation diagram of the naturally mapped 16-PAM is shown in Fisure \6^ a. 
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Figure 6.2: (a) Constellation diagram of naturally mapped 16-PAM modulation, (b) Comput¬ 
ing marginal APPs from the received symbol by using the symbolwise decoder of ^2psk{^)- 


The parity check matrix of the code whose symbolwise decoder can be used to compute 
marginal APPs of the individual bits is 


H2/>5A'(4) 


1 1 
1 0 
0 1 
1 0 
0 1 
0 0 


0 0 
1 0 
1 0 
0 1 
0 1 
1 1 


1 0 
0 1 
0 0 
0 0 
0 0 
0 0 


0 0 
0 0 
1 0 
0 1 
0 0 
0 0 


0 0 
0 0 
0 0 
0 0 
1 0 
0 1 


(6.27) 


Notice that this parity check matrix is a special case of the ^qPSKb^) matrix given in the 
previous example for q - 2. Since - 1 is equal to 1 in the binary field, minus ones in that 
matrix are replaced with plus ones. 


If the received value is denoted with y then the input vector that should be applied to this 
decoder is 

y 2y 4y 8y 2 4 8 8 16 32 

(Tcrcrcr o" cr o" cr o" O" 

This example is illustrated in Fimre W^ 
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6.4 Channel decoders as the detectors of gray mapped M-PAM 


Naturally mapped M-PAM , whose detection by using a decoder is investigated in the previous 
section, suffers from the fact that more than one bits may differ between two adjacent symbols. 
This problem is overcome with the gray mapping in which a one bit differs between two 
adjacent symbols. In this section detection of gray mapped M-PAM by using a decoder is 
investigated. Let (x) be a function from to M which maps binary valued vectors of 
length N to M = 2^ real amplitude values as in the gray mapped M-PAM, i.e. 




\T 



(6.28) 


where x = {x\,X 2 , ■ ■. ,X!yi^ and the summation inside the yS(.) function takes places in F 2 . 
Unfortunately, due this summation inside the yS(.) function, detection of gray mapped M-PAM 
is not a special case MISO detection of BPSK as opposed to the detection of naturally mapped 
M-PAM. Hence, in order to determine the parity check matrix and inputs of the decoder to 
detect the M-PAM we need to obtain the canonical factorization of the joint a posteriori pmf. 

Assume that k^i (X) is transmitted through a discrete additive Gaussian noise channel and Y 
is received. In other words. 


Y = kn{X) + Z, 


(6.29) 


where X = \X\,X 2 ,... ,X^] and Z is real Gaussian random variable with variance cr^. Let 
p(x) denote the joint a posteriori probability Pr{X = x|F = y). The canonical factorization of 
p(x) can be obtained by following the generic procedures explained in Chapter |3] However, 
the canonical factorization of p(x) can be obtained more easily by exploiting the relation 
between k^i (x) and t/at (x). 

The relation between kn (x) and r]xi (x) can be expressed as in 


Kn (x) ^ ? 7 yv (xG(A)) , 


(6.30) 


where G(A) is the N x N matrix defined as 


1 0 ... 0 


G(A) = 


1 1 ... 0 


(6.31) 


1 1 ... 1 
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Since G{N) is a reversible matrix, the canonical factorization of p{x) can be derived from the 
canonical factorization of the APP Pr{W = w|77/v (W) + Z -y]hy following the discussion in 
Section 


Let t(w) be the shorthand notation for the APP Pr{W = w|77a?(W) + Z = j). Since t(w) 
represents the APP in the naturally mapped M-PAM case, its canonical factorization is a 
special case of the canonical factorization given in (16.151) with channel coefficients hi = 
and pq (w) = /? (w). The '/(w; p, cr) function for the BPSK modulation is 


(2p/3{w)\ 
y{w,p,cr) = exp I I. 

Consequently, ?',(w) and ri j{w) in this specific case of (16.151 ) are 


r,(w) ^ Cf, {riw, 2' O')} ^ Cf, jexp 

r;j(w) ^ Cf, {y(w; cr)} = Cf, 


v-^y/3{w)\ 

2cr2 Jj’ 

( 2'+^-2/3(w)\ 


(6.32) 


(6.33) 

(6.34) 


Finally, the canonical factorization of t(w) is 


N 


N j-l 

t(w) = Qy ■} ]~[ r,(fiW^) Y\ ]~[ A;(a)jw^)}. 


i=l 


j=2 i=l 


(6.35) 


Consequently, due to the discussion in Section |4~6] the canonical factorization of p{x) is 


N N j-l 

p{x )=] n n n } ■ 

j=2 i=\ 


;=1 


Let b, / defined as 




k=i 


f,G(A)2^ and aijG(N)^ can be expressed by using hij as 


(6.36) 


(6.37) 


f,G(A)^ - b,-w, 

aijG{Nf = b,-,,_i. 


(6.38) 

(6.39) 


Consequenfly, 
p{x) = 


I N N J-l 

n r,(b,;ivx^) nn r,j(b;j_ix^)i (6.40) 

;=i j=2 i=i j 

i N-l N-l N j-2 f 

rAr(fA,x^) Y\ nj+iifiX^) ]~[ rKb,;ivx^) Y\ ]~[ njihij-ix^) i. (6.41) 

1=1 1=1 j=2 i=l \ 
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This last form of the factorization clearly shows which parity check coefficient vectors are of 
weight two or more. Then, following the discussion in Section 1531 the parity check matrix of 
this code whose decoder can be employed in the detection of gray mapped M-PAM is 


bi,2 

bi,3 

b2,3 


HcRAYiN) = 


bi,v 

b2,v 


Ia'(JV-1)„W(«-1) 

2^2 


(6.42) 


b/v-i,A? 

Notice that the sizes of HoRArCA^) and H^psKiN) are same. 


The symbolwise and ML codeword decoders of the code with parity check matrix HosAyCA^) 
can be designed for BPSK modulation and AWGN channel. In order to achieve the desired 
detection the inputs applied to this decoder should be a permuted version of the inputs applied 
for the naturally mapped detection since the canonical factorization of p{x) is derived from 
the canonical factorization of t(w). The first N of these inputs are 

- 2021 2122 2iv-22^-i 2^-^y' 

cr cr cr (T 


The last A - 1 of these inputs are 

y ^ 2^-^y ' 

cr cr cr 


The remaining ifOM-il) inputs in between are 


0o2 


2^2 


cr 


2^2^ 


cr 


2123 


cr 


OoV-l 


202 


2^2^- 


cr 


cr 


2^-3 2^-1 
cr 


Example 6.3 This example shows how to compute marginal APPs of four bits which are 
modulated with gray mapped 16-PAM and passed through an AWGN channel with PSD 
No/2 = cr^. Constellation diagram of the 16-PAM modulation with gray mapping is shown 
in Figure 16.31 a. This example demonstrates an interesting property of demodulating gray 
mapped M-PAM modulation with decoders. 
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The parity check matrix of the code whose symbolwise decoder can be used to compute 
marginal APPs of the individual bits is 


HGi?Ay(4) = 


1 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

1 

0 

0 

0 

1 

1 

1 

1 

0 

0 

0 

1 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

1 


(6.43) 


If the received value is denoted with y then the input vector that should be applied to this 
decoder is 


2 8 32 Sy 4 8 16 y 2y 4y 

(T (T (7 cr (T (T (T cr cr (T 


It is well known that carrying out row operations on the parity check matrix of a code does not 
alter the code. Hence, we can carry out row operations on Hgraf(4) and obtain an alternative 
parity check matrix for the code. Let H' be the parity check matrix derived from by 

adding the first row onto second and fourth rows and then adding sixth row onto fourth and 
fifth rows, i.e. 


H' = 


1 1 
0 0 
0 1 
0 0 
0 1 
0 0 


0 0 
1 0 
1 0 
0 0 
0 0 
1 1 


1 0 
1 1 
0 0 
1 0 
0 0 
0 0 


0 0 
0 0 
1 0 
0 1 
0 0 
0 0 


0 0 
0 0 
0 0 
0 1 
1 1 
0 1 


(6.44) 


Since H' and Hgraf(4) are the parity check matrices of the same code, we can use the de¬ 
coder designed for either H' or HGi?AF(4) to compute soft outputs in gray mapped 16-PAM 
modulation. 


Notice that all rows H' are of weight 3. Moreover, four columns of H' are of weight 3 and 
the remaining six columns are of weight one. H' shares these properties with H2P5A'(4). 
Furthermore, let H" be the parity check matrix derived from H' by replacing the first column 
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with fifth and fourth column with tenth, i.e. 


H" = 


1 1 
1 0 
0 1 
1 0 
0 1 
0 0 


0 0 
1 0 
1 0 
0 1 
0 1 
1 1 


1 0 
0 1 
0 0 
0 0 
0 0 
0 0 


0 0 
0 0 
1 0 
0 1 
0 0 
0 0 


0 0 
0 0 
0 0 
0 0 
1 0 
0 1 


(6.45) 


H" describes a code whose codewords are permuted form of the codewords of the code de¬ 
scribed by H'. Hence, we can also use the decoder designed for H" to compute soft outputs 
in gray mapped 16-PAM modulation. In order to achieve this demodulation it is necessary 
to permute the inputs applied to the decoder designed for HGi?/iy(4) before applying to the 
decoder designed for H" in the same order as the column permutations applied while passing 
from H' to H". Hence, the inputs that should be applied to this decoder are 

4 8 32 4y 2 8 16 y 2y 8y 

a o" (T cr o" a cr cr cr cr 


The interesting point in here is that H" is equal to H2P5 a:(4). Therefore, the symbolwise 
decoder of the parity check matrix H2P5A'(4) can be used to compute marginal APPsfor both 
naturally mapped and gray mapped 16-PAM modulation. The decoder can be configured to 
natural mapping or gray mapping by permuting the inputs. Computing the soft outputs in of 
gray mapped 16-PAM modulation depicted in Fieure W^ c. 


The example above shows that the decoder of H2f>5jr(4) can be used to demodulate both 
naturally mapped and gray mapped 16-PAM modulation. The following theorem states that 
this is true not only for 16-PAM but for any M-PAM modulation. 


Theorem 6.1 There exist a sequence of row operations such that performing these row oper¬ 
ations on Hgraf(A^) leads to H 2 /« 5 A'(A) with some columns permuted. 


A constructive proof is given in Appendix IA.4.21 
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Figure 6.3: (a) Constellation diagram of gray mapped 16-PAM modulation, (b) Computing 
marginal APPs from the received symbol by using the symbolwise decoder of HGi?AF(4). (c) 
Computing marginal APPs by using the symbolwise decoder of H2P5A'(4). 


74 


















6.5 MIMO detection by using channel decoders 


In this section we show how to employ channel decoders for multiple-input multiple-output 
(MIMO) detection. The analysis is presented for QPSK modulation but is straightforward to 
extend method to any other PAM or QAM modulation. 

6.5.1 System Model 

Let a random vector = \X 2 k-\,^ 2 k\ is mapped to a complex symbol Wk via the function 
y (.) as in 

Wk = v{Xk), (6.46) 

where v (.) represents the gray mapped QPSK modulation and defined as 


1 , 

X - [0 

0] 

7> 

X - [0 

1 ] 

-L 

X- [1 

1 ] 

-7> 

X- [1 

0] 


and j is the square root of -1. The constellation diagram of gray mapped QPSK modulation 
is shown in Figure [631 Furthermore, let a random vector W = [IFi, W' 2 > • • ■ > is passed 

through an NrXNt MIMO channel with independent circularly symmetric Gaussian noise and 
the received vector is Y. In other words, 

Y = HcW + Z, (6.48) 

where is the NrXNt channel coefficient matrix, Z is the A^x 1 noise vector consisting of in¬ 
dependent, zero mean, circularly symmetric normal distributed random variables of variance 
2o-^. 

ML MIMO detection is the task of determining the configuration x maximizes the likelihood 
function Pr{Y = y|X = xj where X is 

X4[Xi X2 ... X^,]. (6.49) 

We assume that all X is uniformly distributed. Hence, ML MIMO detection is equivalent to 
finding fhe configuration maximizing fhe APP Pr{X = x|Y = y). Soff oufpuf MIMO defecfion 
is fhe fask of compufing fhe marginal APPs Pr{X^ = x|Y = y}. 
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Figure 6.4: The QPSK constellation with gray mapping 


6.5.2 The decoders which can be used in MIMO detection with QPSK signaling 


The first step in determining the parity check matrix of the decoders which can be employed as 
MIMO demodulators is determining the canonical factorization of the APR The APR Pr{X = 
x|Y ^ y) is 


Pr{X = x|Y ^ y} = C^in, t exp - 


lly - H,w||^ 


2(7-2 


(6.50) 


where X is [xi,X 2 ,.. .,xn,], xu is {x 2 k~i,X 2 k\, and w is [y (xi), v (X 2 ) ,..., v(xAr,)]2’. As shown 
in Appendix IA.4.31 this APP can factored as 


N, 


r. rv IX/ 1 TT / Rete)+ Im{M^) \ / Re{uk}-lm{uk] 

Pr{X ^ x|Y ^ y) oc I ^y\x 2 k-u -;;- ,cr\y\x2k-, -;;- ,cr 


k=\ 

N, k-l 


k=2 l=l 
N, k-\ 


FI FI 


Re|(R)t/l 


k =2 l=\ 


■ ]~[ ]~[r U2i(: -I-X 2 /- 1 ; 


2 

Im{(R)i:,/j 


,cr\y\X2k-\ +X2i\ 


Im|(R)^,,/) 


-,cr , 


, . Re{(R)i,/l 

,0-\y\X2k + X2t; -r- ,0- 


where R and u are 


(6.51) 


R 


— 14^14 

— r±Cf 


U = Hfy, 


(6.52) 

(6.53) 


and (R)/t,/ denotes k by entry of R and Uk is the component of u. 
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The factorization above can be expressed by using the a^j vectors defined in (16.161) as 


„ , A /f T Re{ni)+ Im{Mi) \ / j Re - Im 

Pr{X ^ x|Y = y) oc I |y f 2 i-ix ;---,o- y f 2 iX ;---,cr 


k=\ 

N, k-[ 


nn 

k=2 l=l 
N, k-l 

nn 

k=2 1=1 


y a 2 ^-i, 2 /-ix ;■ 


Re{(Rk/ 


,cr|y|a2^-i,2/x^; 




-, 0-1 


y 32^,2/-ix ; 


Im {{R)kj 


,cr y a2A:,2/X 


Re|(R)A- 


t,/ 


-,cr 


(6.54) 


The only remaining step in the derivation of canonical factorization is to normalize all of the 
factor functions existing above. We omit this obvious step for the sake of neatness. This 
factorization leads to the parity check matrix of the decoder which can be used in MIMO 
detection given in 


flMIMO,QPSK{Nt) = 


UhN,) 

U2,Nt) 

hN,(N,-l)x2N,{N,-l) , 


(6.55) 


L(iv,-i,ivn 


where L(k, Nt) is 


ai,2t:+l 

a2,2t:+l 


Uk,Nt) = 


^2k,2k+l 

^l,2k+2 

^2,2k+2 


(6.56) 


^2k,2k+2 

As in the previous sections we can use a decoder designed for BPSK modulation and AWGN 
channel for MIMO detection. The inputs that should be applied to this decoder to achieve 
MIMO detection are the first parameters after the semicolon divided by the second parameters 
of the y(.;.,.) functions in the factorization given in (16.541) . 


Example 6.4 This example shows how to compute marginal APPs of four bits which are first 
modulated with QPSK modulation and passed through a 2 x 2 MIMO channel with channel 
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Figure 6.5: Computing the marginal APPs in a MIMO system by using two different decoders, 
(a) By using the decoder of ^ mimo , qpsk {^)- (b) By using the decoder of 


coefficient matrix and noise variance 2cr^. 


The parity check matrix of the symbolwise 


decoder which can be used for this purpose is 


HM/MO,QP5/r(2) 


10 10 10 0 0 
0 110 0 10 0 
1 0 0 1 0 0 1 0 
0 1 0 1 0 0 0 1 


(6.57) 


Let the vector i = \t\,t2^ ■ ■ ■ 


Refui) + Imfui) Re {ui) - Im (mi) Re {mi! + Im {M 2 ) Re {M 2 ) - Im |m 2 ) 


2cr 

Re{(R)i,2) 

2cr 


2cr 

Im{(R)i, 2 ) Im{(R)i, 2 ) 


2cr 


2cr 


Re{(R)i,2 


2cr 


2cr 


2cr 


(6.58) 


where R = H^Hc and u = H^y. This t vector is the vector that must be applied to the decoder. 

Notice that Ti.MiMO.QPSKi'^) rj sub-matrix o/H2/-5^(4). Therefore, the symbolwise decoder 
o/H2P5^(4) can also be used to compute marginal APT probabilities in MIMO detection. The 
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inputs that must be applied in this case are given below. 


Re {mi) + Im{ mi) Re {mi) - Im{ mi) Re {^ 2 ) + Iin{M 2 ) Re{M 2 ) - Ini{M 2 ) 


0 


2cr 2cr la Icr 

Re|(R)i,2| Im|(R)i,2) Im|(R)i,2} Re|(R)i,2! ^ 


(6.59) 


2cr Icr 2(7 2(7 

Notice that we added two zeros to the input vector when compared to the vector t. These zeros 


correspond the missing columns in ^mimo.qpskQ-) when compared to ^ 2 PSk{^)- Computing 
the marginal APPs with these two decoders is depicted in Fisure W5\ 


It is worth emphasizing that in Examples l6.2[l6.3[ and l6.4l the decoder of ^ 2 PSk{^) is used for 
three different purposes. 


6.6 Usage of decoders of tail biting convolutional codes as approximate MIMO 
detectors 


Trellis representation is mainly used for representing convolutional codes. However, it is also 
possible to represent block codes with trellises |[3^ . Block codes can also be represented 
with a special type of trellis which is the tail biting trellis. Maximum trellis width in a tail 
biting trellis might be as low as the square root of the maximum width of the ordinary trellis 
representing the same code ifTTl Ihll. 


If a block code has a parity check matrix as in the form given below 


((Lrxc))o 

((Lrxc))l 

^rcxrc 5 


(6.60) 


((Lrxc))c-1 


where L^xc is any rxc matrix and ((L)),- denotes cyclically shifting the columns of L towards 
right i times, then it is called a tail biting convolutional code of rate l/(r +1). For instance, 
the Golay code is of this type ifTTI . Tail biting convolutional codes can be encoded by the 
encoders of the convolutional codes by applying the data bits cyclically. 


The tail biting convolutional codes have simple approximate decoders enjoying low complex¬ 
ity nnii. Hence, there are many studies and standards, such as LTE, exploiting this reduc¬ 
tion in complexity and simplicity of the tail biting trellises. Even an analog implementation 
of such a decoder is proposed in |[T4l . 
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In this section we show that the MIMO detection problem can be handled by the decoder of 
a tail biting convolutional code. The characteristics of this code depend on the number of 
transmitting antennae and modulation used. We are going to analyze the MIMO detectors for 
QPSK modulation as we did in the previous section, although it is possible to generalize the 
technique to other QAM and PAM modulations as well. 

We are going to use the same channel model and notation as in the previous section. That 
model lead us the parity check matrix ^mimo,qpsk{N) given in (16.551) . This parity check 
matrix hardly looks like the parity check matrix of a tail biting convolutional code. 

Let a permutation matrix P is defined as in 



Furthermore, let V be obtained by permuting X as in 

V = XP. (6.62) 


Since V is a permutation of X, maximizing (marginalizing) the APP Pr{X = x|Y = y) is 
equivalent to maximizing (marginalizing) the APP Pr{V = v|Y = y). Let t(v) be a short¬ 
hand notation for Pr{V = v|Y = yj. Then the factorization of t(v) can be derived from the 
factorization (16.541 ) as 


N, 


t(v) oc]~[y f 2 i,_iPv^; 


j Re{uk} + lm{uk} \ ^ t Re {«<:)-Im {ui) 

-X- ,o-\y\i2k^y‘\ --- ,cr 


k=i 

N, k-l 

•nnd a 2 t:-!, 2 /-lPv'' ; 
k=2 l=l ' 

N, k-l , 

■ ri 

k=2 l=l ' 


J. Re|(R)r.,/) ^ ^ Im [(R)kj] 


2 

Im|(R)^jj 


,cr y a 2 ^-i, 2 /Pv';- 


II T» r Re{(R)t:,/| 

, 0 - r a 2 A:, 2 /Pv ;---,cr 


,cr , (6.63) 


since (P ^ )^ = P. Consequently, a parity check matrix whose ML codeword (symbolwise) 
decoder can be employed in maximization (marginalization) of Pr{V = v|Y = y) is 


Hv(fVf) = [B(A,) I2W,(W,-1)X2A?,(W,-1)] (6.64) 


where B(A) is 


B(A) = 


L(l,fV)P 

L(2,A)P 


L(A- 1,A^)P 


(6.65) 
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Input sequence 

^N,' '^N,+n ■■■’ '^2N,’ ^1, Vg, 



Figure 6.6: The encoder of the tail hiting convolutional code whose decoder can be used as the 
detector MIMO system with Nt transmit and Nr receiving antennae. Notice that as opposed 
to ordinary convolutional encoders the encoder does not initiate from the all zero state. Tail 
biting nature of the decoder arises from the fact that after all the input sequence is applied the 
decoder returns to its initial condition. 


Other alternative parity check matrices whose decoder can be employed in performing infer¬ 
ence on Pr{V = v|Y = y) are in the form of 


[B'(N,) l2W,(A?,-l)x2W,(W,-l)] > 


where is derived from B{Nt) by permuting rows (not columns this time). Fortunately, 

there exists a special row permutation which forms B{Nt) into the form given in 


where is 


BrB(Yf) — 


((LraCNf) 0(n^-i)xn,))o 
(( LrB(iVr) 0(Ar,_i)xA?,))i 


(( LraCNf) 0(Ar,_i)xAf,))2A?, 


LisiN) — [I(Ar,_i)x(A?,-l) l(Ar,-i)xl]. 


( 6 . 66 ) 


(6.67) 


Consequently, the decoders of the parity check matrix given in 


HrB,M/Mo(iVf) - [B7’B(iVr) l2A?,(Af,-l)x2A?,W-l)] 


( 6 . 68 ) 


can be employed in performing inference on Pr{V = v|Y = y). is the parity 

check matrix of the tail biting convolutional code of rate (1/(77?)) and constraint length Nt, 
whose encoder is shown in Figure 16.61 
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Example 6.5 In this example we are demonstrate that B7b(A^,) can be derived from B(A^f) by 
permuting rows for cases Nt = 2 and Nt = 3 . 


For Nt = 2 , B{Nt) is equal to L(l, 2 )P. By d 6 . 56 D . L(l, 2 ) is 


L(l, 2 ) 


1 

0 

1 

0 

0 

1 

1 

0 

1 

0 

0 

1 

0 

1 

0 

1 


Consequently, B( 2 ) is 


B( 2 ) = 


110 0 
0 110 
10 0 1 
0 0 11 


Changing the places of third and fourth rows gives Btb{ 2 ), which is 


( 6 . 69 ) 


( 6 . 70 ) 


^tb{2) = 


110 0 
0 110 
0 0 11 
10 0 1 


( 6 . 71 ) 


The Tanner graph of the resulting I^tbmimo{ 2 ) = \^tb{ 2 ) I4X4] is shown in Fisure W 7 [ a. 


For Nt - 2 , B( 77 f) is equal to 


L(l, 3 ) 

L( 2 , 3 ) 


P where 

L(l, 3 ) 

L( 2 , 3 ) 

is 



1 

0 

1 

0 

0 

0 



0 

1 

1 

0 

0 

0 



1 

0 

0 

1 

0 

0 



0 

1 

0 

1 

0 

0 



1 

0 

0 

0 

1 

0 

L(l, 3 ) 



0 

1 

0 

0 

1 

0 

L( 2 , 3 ) 


0 

0 

1 

0 

1 

0 



0 

0 

0 

1 

1 

0 



1 

0 
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Then B(3) is 


B(3) = 


1 1 0 
0 1 0 
1 0 0 
0 0 0 
1 0 1 
0 0 1 
0 1 1 
0 0 1 
1 0 0 
0 0 0 
0 1 0 
0 0 0 


0 0 0 
1 0 0 
0 1 0 
1 1 0 
0 0 0 
1 0 0 
0 0 0 
0 1 0 
0 0 1 
1 0 1 
0 0 1 
0 1 1 


(6.73) 


Finally carrying the 5'^ 7'^ 2"^, 6'^ 8'^ 4'^ 10'^ 12'^ 3"^, 9'^ ll'^ and 1"' rows to V’, 2"^, 
..12^* rows gives B7b(3) as in 


B7’b(3) = 


1 0 1 
0 1 1 
0 1 0 
0 0 1 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
1 0 0 
1 0 0 
0 1 0 
1 1 0 


0 0 0 
0 0 0 
1 0 0 
1 0 0 
0 1 0 
1 1 0 
1 0 1 
0 1 1 
0 1 0 
0 0 1 
0 0 1 
0 0 0 


(6.74) 


The Wiberg style Tanner graph of the resulting ^tb,mimoO) - [BrB(3) I 12 X 12 ] is shown in 

Fisure \677[ b. 
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(a) 



(b) 


Figure 6.7: The Tanner graphs of ^TBMiMoi^t) for Nt - 2 and Nt - 3. (a) Tanner graph of 
HrB,M/Mo(2). (b) Wiberg style Tanner graph of H7-e,M/Mo(3) 
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6.6.1 Using the decoding algorithms of tail biting convolutional codes for MIMO de¬ 
tection 

Since a tail biting trellis does not have a starting or ending state, Viterbi and BCJR algorithms 
cannot be run on such trellises directly. To process a tail biting trellis with Viterbi algorithm 
we need to run the Viterbi algorithm v times on the trellis where v denotes the trellis width. In 
each run, the Viterbi algorithm determines a candidate path which is the most probable path 
among the paths starting and ending on a certain state on the trellis. Then the most probable 
path can be chosen among the v candidate paths. Since the complexity of each running of the 
Viterbi algorithm is 0{Lv), where L denotes the length of the trellis, the complexity of deter¬ 
mining the most possible path with Viterbi algorithm is 0{Lv^). Recall that the complexity 
would be 0{Lv) if the trellis were an ordinary trellis. Similar arguments are true for the BCJR 
algorithm as well. 

The complexity of ML codeword and exact symbolwise decoders of m/mo W) is 0{Nt2^^') 
as explained in the previous paragraph. The complexity of the trivial MIMO detection algo¬ 
rithm is 0{2^^‘). Hence, using the exact decoders of Hj-b m/moW) for MIMO detection does 
not make sense. 

Fortunately, tail biting convolutional codes have an approximate symbolwise decoder. This 
decoder operates by running BCJR algorithm on the tail biting trellis iteratively. Equivalently, 
this decoder can be viewed as the iterative sum-product algorithm running on the Wiberg 
style Tanner graph an example of which is shown in Figure [Ol b. Usually, a few iterations 
are sufficient to converge lH. We propose implementing an approximate soft output MIMO 
detector by using this approximate symbolwise as the decoder UrsMiMoiNt)- Such a MIMO 
detector is also capable of using any a priori information available since it uses the BCJR 
algorithm. The block diagram of this approximate soft output MIMO detector is shown in 
Figure [631 


6.6.2 Complexity issues 

There are two subtasks when the decoder mentioned above is employed as an approximate soft 
ouput MIMO detector. These tasks are the computation of the inputs applied to the decoder 
and processing the decoder trellis. 
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Figure 6.8: Block diagram of the proposed approximate soft output MIMO detector which 
uses the approximate decoder of a tail biting convolutional code. 

As explained in Section 16.51 the inputs that must be applied to the decoder of ^tb,mimo 
are the components of u and the entries of R defined in (16.521) and (16.531) respectively. The 
computation of u has a complexity 0{NrNt) whereas the computation of R has a complexity 
OiNfNr). 

Processing the decoding trellis with the BCJR algorithm has a complexity 0(A,2^')- This 
complexity is almost the square root of the complexity of the trivial ML and soft output MIMO 
detectors which is 0(2^^')- From a computer scientific point of view, this last component of 
the complexity might be dominant to the complexity of the computation of R. However, in 
an engineering point of view computing R is a more computationally demanding task than 
processing the decoding trellis for two reasons. First, in a practical scenario Nr and Nt is eight 
at most. Hence, Nt2^' and NfNr are comparable in practical scenarios. Second, the decoding 
trellis processing involves only additions and maximizationwhereas computing R involves 
complex multiplications which require much more complex hardware than addition. There¬ 
fore, computing R is the most computationally demanding subtask of the proposed method. 
However, it should be noted that R is computed only once for a constant He. 

The proposed technique, which employs a tail biting decoder as the MIMO detector, is com¬ 
parable to other sub optimal methods such as minimum mean square error (MMSE) or zero 
forcing (ZF) detectors in terms of hardware complexity which both have a complexity 0{N^) 
if Nt - Nr = N. Furthermore, other sub optimal methods require matrix inversion. Although, 
matrix inversion have complexity 0{N^), it requires complex number divisions which require 
even more complex hardware than multiplication. Hence, the proposed technique still has an 
advantage in terms of hardware complexity over MMSE and ZE detectors. 

^ We assume Max-Log-MAP approximation is used for the BCJR algorithm running on the trellis. 
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Figure 6.9: BER performances of the MIMO detector using the decoder of a tail biting convo¬ 
lutional code, the symbolwise MAP MIMO detector, and the linear MMSE MIMO detector 
in a Rayleigh fading 8x8 MIMO channel. 


6.6.3 Simulation Results 

We simulated the proposed approximate soft output MIMO detector for the 8x8 Rayleigh 
fading MIMO channel. In this channel entries of are independent, zero-mean, circularly 
symmetric Gaussian random variables where the variances of the real and imaginary parts are 
1/2. We assumed that He changes for every transmitted MIMO symbol and perfectly known 
at the receiver side. The noise vector added at the receiver also consists of independent, zero- 
mean, circularly symmetric Gaussian random variables where the variances of the real and 
imaginary parts are NqI2. The signal to noise ratio (SNR) per receiving antenna is EhINo. 
Since there are Nr receiving antennae in a MIMO system the convention is to use NrEh /Nq as 
SNR HTtH . 

The bit error rate (BER) performance of the proposed algorithm is shown in Eigure lG^ These 
results show that the proposed method has an unexpected poor performance when compared 
to the symbolwise MAP MIMO detector. Moreover, the proposed method exhibits an error 
floor as early as 2 x 10“^ level. The performance of the proposed algorithm is better than the 
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linear minimum mean square error (MMSE) MIMO detector Il43l until 16 dB. After 16 dB the 
performance of the linear MMSE becomes better due to the early error floor of the proposed 
MIMO detector. We provide some comments on this unexpected performance in the next 
section and propose an improvement in Section [6.6.5l 


6.6.4 Comments on the convergence of the sum-product algorithm on factor graphs 
with a single cycle 

The Wiberg style Tanner graph that represents the tail biting trellis contains only a single loop, 
as in Eigure lOl -a. There are many studies in the sum-product algorithm literature which claim 
that the sum-product algorithm running on Tanner graph with a single cycle always converges 
such as |[38l[39ll^ . These studies also claim that the approximate marginals computed by 
the sum-product algorithm is close to the exact marginals when the sum-product runs on these 
graphs. According to these studies, our proposed MIMO detector was supposed to converge 
at all times and it was expected to yield good results. However, our empirical results shown 
in Eisure [6!^ do not agree with these expectations. 

Our experimental results verify that the sum-product algorithm running on a Tanner factor 
graph with a single cycle always converges. However, in some cases this convergence require 
as few as two or three iterations to converge whereas in some other rare cases it might require 
thousands of iterations. A detailed analysis of the experimental results shows that the rela¬ 
tively high error floor in Eigure [6^ is caused by the cases in which the sum-product algorithm 
requires thousands of iterations to converge. Therefore, the sum-product algorithm produces 
good approximations of the exact marginals only if it converges in a few iterations. Other¬ 
wise, the results generated by the sum-product algorithm is not a good approximation. We 
provide a numerical example in which sum-product algorithm requires thousands of iterations 
to converge below. 

Example 6.6 We provide the example for the Tanner graph shown in Fisure \6W\ -a which is a 
factor graph with just a single cycle and contains only binary variable nodes. Let the inputs 
applied to the decoder represented by the Tanner graph shown in Figure 16.71 a designed for 
BPSK modulation and AWGN channel be 

[-55 60 -25 -20 40 55 40 -55] (6.75) 
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If one runs the sum product algorithm on the Tanner graph shown in Fisure \677\ a with these 
inputs, it can be observed that the sum-product algorithm achieves a reasonable convergence 
at least after 3000 iterations. Such an input settings can be observed in a scenario in which 
that decoder is employed as a MIMO detector for a2x2 channel with coefficients 


l.5j 1 - 0.57 
1+0.57 -0.5-1.57 


and a sequence [- 7 , 1 ] is transmitted when noise has a variance cr^ = 0 . 01 . 

Vi/e would like to note that the likelihoods given above are very unlikely to be observed in a 
real channel decoding problem. Therefore, such likelihoods is probably never observed in / |53 
|59] 1?^ . Hence, they claimed that the sum-product algorithm produces good approximations 
for exact marginals if the sum-product algorithm converges. Unfortunately, this claim is not 
quite true as this counter example shows. 


Even if the sum-product algorithm produced good approximations in cases requiring thou¬ 
sands of iterations to converge, a practical MIMO detection algorithm cannot wait that much 
to complete the demodulation of a single MIMO symbol. Therefore, this late convergence 
problem requires a solution to develop a practical MIMO detection algorithm with tail biting 
decoders which we provide in the next section. 


6.6.5 Performance Improvements by using tail biting convolutional codes of longer 
constraint length 

Recall that the tail biting decoder of ^TB.MiMofNt) is used for performing inference on Pr{V = 
v|Y = y) where V was a permutation of X given by (16.621) . This decoder can only perform 
inference for this specific permutation of X. 

We define an extended version of parity check matrix H^b ^/moW) as in 

^ETB.MIMofNt) — ^TB{Nt) ^2N}y.2N}\^ (6.76) 
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where CzBiNt) is 


CreW) — 


(( L7 -b(A^,+ i) ON,x{N,-r)))o 
(( L7-b(A^,+ i) ON,xiN,-\)))\ 


(6.77) 


((LrB(A^,+ l) 0Ar,x(A?,-l)))2A?, 

Notice that HETB.MiMoi^t) is the parity check matrix of a tail biting convolutional code of 
rate l/{Nt + 1) and of constraint length + 1 and can be derived from Hj-b m/moW) by 
adding 2Nt more parity checks. 


As opposed to the decoder of UrBMiMoi^t), which can perform inference only on Pr{V = 
v|Y = y}, the decoder of Hb^b ^/moW) can be used to perform inference on Pr{VA = v|Y = 
y} where Va is any permutation of X. 


An improved soft output MIMO detector can be implemented by using the approximate sym- 
bolwise detector of ^ETBMiMoi^t) instead of H^b,m/moW). The main advantage of the 
detector with extended tail biting decoder when compared original tail biting decoder is that 
it can work with any permutation of X. Moreover, a certain permutation can work better for a 
given He and noise realization while another permutation can work better with another He and 
noise realization. This flexibility comes at the cost of increasing trellis processing complexity 
by two which is acceptable. 

We propose a soft output MIMO detection algorithm by using the approximate symbolwise 
decoder of the extended tail biting code as follows. 


1. Select a permutation Pa from a set !P of permutations. 

2. Apply the inputs properly permuted with the permutation Pa to the approximate sym¬ 
bolwise decoder of ^ETB,MiMo{Nt). 

3. Run the BCJR algorithm iteratively on the tail biting trellis until it converges or a max¬ 
imum number of iterations reached. 

4. If the iterative BCJR algorithm converges declare its result as the output of the MIMO 
detector and halt. 

5. If the iterative BCJR algorithm does not converge select another permutation Pa from 
f and goto Step 2. If there is not any remaining permutation in !P then declare a failure. 
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Figure 6.10: BER performance of the MIMO detector using the extended tail biting decoder 
with different permutations together with the MIMO detector with the normal tail biting de¬ 
coder and symbolwise MAP MIMO detector in Rayleigh fading 8x8 channel. 


We tested this algorithm on the 8x8 MIMO channel described in Section [6.6.3l The set f we 
used in this simulations consists of 15 specific permutations among 16! possible permutations. 
These permutations are given Appendix IA.4.41 BER performance of this MIMO detector is 
given in Eigure 16.101 These results show that the MIMO detector using the extended tail 
biting decoder improves the error floor performance by an order of magnitude. Eurthermore, 
the BER performance before reaching the error floor is also improved significantly. The 
improved MIMO detector is just 2dB away from the optimum algorithm when it reaches the 
error floor. 

Recall that this MIMO detector is capable of using a priori information and produces soft 
output. Hence, it can be easily used in a iterative detection-decoding scheme. In order esti¬ 
mate the possible performance of the improved MIMO detector in such an iterative scheme, 
we computed extrinsic information transfer (EXIT) curves l|35l |36l |371. The area under the 
EXIT curve of a MIMO detector is an approximate estimation of the maximum possible rate 
of the code which can be used in an iterative detection-decoding scheme and can achieve ar¬ 
bitrarily small error rate. In this aspect the area under exact soft output MIMO detector is an 
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Figure 6.11: EXIT curves of the approximate MIMO detector using extended tail biting de 
coder and the exact soft output MIMO detector at NrEi, INq - -0.96dB 



Figure 6.12: EXIT curves of the approximate MIMO detector using extended tail biting de 
coder and the exact soft output MIMO detector NrEh /Nq - 1.25dB 
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Figure 6.13: EXIT curves of the approximate MIMO detector using extended tail biting de¬ 
coder and the exact soft output MIMO detector NrEhINo - 6.02dB 


approximate estimation of the MIMO channel capacity lIMIl . 

We computed the EXIT curves at three different SNR values. These results are shown in 
Eigures 16.11[ 16.12[ and 16.131 Since the area between the two EXIT curves in Eigure 16.1 II 
is negligible, the proposed algorithm can be used in an iterative detection-decoding scheme 
with the same code as the optimum algorithm at low SNR or in the power limited region. 
The EXIT curves shown in Eigure [02] lead to similar conclusion. The area between the two 
EXIT curves becomes 0.04 in Eigure 16.131 This means that the proposed algorithm can also 
be used in the bandwidth limited region but at the cost of a rate loss of 0.04bits which is quite 
acceptable. 


6.7 Usage of the decoders of the convolutional codes as channel equalizers 


Eet X{t) be a stochastic process defined as follows. 


X{t) = Y,TlN{^n)f{t-nT), 
n 


(6.78) 
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where f{t) is the impulse response of a pulse shaping filter and X„ is a random vector consist¬ 
ing of N bits. Furthermore, let Y{t) be 

F(0 - Xit)*git) + Z{t) (6.79) 

n 

where Z(t) is a zero mean white Gaussian noise process with power spectral density * 
denotes convolution, g{t) is the impulse response of a causal channel, and h{t) is the convolu¬ 
tion of the git) and fit). Then it can be shown by following similar procedures applied in the 
previous sections that the Viterbi and BCJR decoders of a certain convolutional code C can 
be used as ML sequence estimator and marginal APP receiver for this inter-symbol interfer¬ 
ence system respectively. This code C is the non-recursive systematic convolutional code of 
rate 1 jNL and of constraint length NL where L is the smallest integer such that hf) = 0 for 
t > LT. The generator polynomials of this code are 1, 1 -i- v, 1 -i- ..., 1 -i- . 

The inputs that must be applied to these decoders to achieve the desired results consists of 
samples taken from the output of the matched filter i.e. y(f) * hi-t) with sampling period 
T, where y(t) is the received signal, samples taken from the time autocorrelation function 
hit) * hi-t) again with sampling period T, and scaling of these samples with 2’s powers Q 

The Viterbi decoder of the mentioned code above actually works as an alternative device to 
compute the Ungerboeck’s metric lldTI . Therefore, this result would be much more interesting 
if we achieved it before Ungerboeck. However, using a Viterbi decoder as an alternative 
device to compute Ungerboeck’s might still be of practical importance since this approach 
takes all of the multiplications outside of the Viterbi data path. 

We have also empirically verified fhaf fhe BCJR decoder of fhe convolutional code menfioned 
above wifh fhe menfioned inpufs refums fhe exacf marginal APPs of fhe fransmiffed bifs. 


We dropped conjugations since rjn (X„) is real 
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CHAPTER 7 


DETERMINING CONDITIONAL INDEPENDENCE 
RELATIONS EROM THE CANONICAL EACTORIZATION 


7.1 Introduction 

Investigating the conditional independence relations of random variables is important in many 
different disciplines iflTl ITU . These conditional independence relationships are well rep¬ 
resented by a graphical model called Markov random field (MRF) or undirected graphical 
model. In fhis secfion we show fhaf fhe MRF represenfing a join! pmf can be determined from 
fhe projecfions of fhe join! PMF onfo fhe subspaces described in Chapfer[3l 

This chapter begins wifh infroducing fhe relation befween conditional independence of fwo 
random variables and fhe canonical faclorizalion. Then we explain how fo determine Markov 
blankefs from fhe canonical faclorizalion. This chapfer ends wifh comparing fhe canonical 
faclorizalion wilh Ihe Hammersley-Clifford Theorem. 


7.2 Conditional Independence of Two Random Variables 

Suppose lhal it is desired to determine the conditional independence relations between the 
components of the random vector X = [X\,X 2 , ... which is distributed with a p{x) € 
PfN . Then a random variable X, is said to be conditionally independent of Xj given all the 
other components of X if and only if the following relation is satisfied: 

Pr {Xi = X,|X\|,-| = x\|;|) ^ Pr {X; ^ X;|X\|,J| = x\|;j|} (7.1) 
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where X\j (x\j) denotes the vector obtained by removing the components having indices in 
I from X (x). The following theorem states the necessary and sufficient conditions for the 
conditional independence of two random variables in terms of the canonical factorization. 


Theorem 7.1 Let H be a random vector distributed with p{x) in 'Pvq- and Xi are condi¬ 
tionally independent given X\|i /| if and only if p{x) can be factored as 



(7.2) 


where 'Kk and 'Kt are defined as 


74 ^ {a,- € 9d : 4af - o) 
% ^ {a,- € 77 : f;af = o) 


The proof is given Appendix IA.5.11 

The forward statement of this theorem asserts that if none of the SPC factors composing the 
canonical factorization of p(x) depend on both v, and xj simultaneously then X, and Xj are 
conditionally independent given X\|,- y|. Actually, this result is true not only for the canonical 
factorization but also for any factorization. 

The backward statement of Theorem IV. 1 I states that if an SPC factor of p(x) with nonzero norm 
depends on v, and xj simultaneously then X, and Xj are definitely conditionally dependent 
given X\|; ^|. On the other hand, in an ordinary factorization a factor function may depend on 
Xi and Xj together but X; and Xj can still be conditionally independent given X\|,' ^|. Therefore, 
the backward statement of Theorem IV. ll is specific to the canonical factorization and does not 
hold for all factorizations in general. This fact is another reason why we call the proposed 
factorization the canonical factorization. 

7.3 Determining Markov Blankets and the Markov Random Field 

The Markov blanket of a random variable X,-, which is denoted with 5X/, is the smallest 
possible set containing the components of X\|,'| which satisfies 


Pr{X,|X\|;|} - Pr{X;|5X,). 


(7.3) 
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Clearly, dXi consists of variables Xj which are not conditionally independent of X, given 
X\|, y|. Based on Theorem 17.11 can be obtained in terms of projections onto the SPC 
constraints as follows. 


Corollary 7.2 Let the canonical factorization of p{x) be given by 


P{X) ^ C^N 



(7.4) 


Xk is in dXi if and only if there exist a parity check coefficient vector a,- e "K such that 


haj * 0 

f/af + 0 


and 


IlnWII > 0. 


Proof. If such a vector a exist then p(x) cannot be factored as in (17.21) and hence, X,- and Xj 
are conditionally dependent given X\|, due Theorem 17. II 

If there is no such a then p(x) can be factored as in (17.21) . which means that X,- and Xy are 
conditionally independent given X\|,yy|. ■ 

The MRF is an undirected graphical model representing a probability distribution where each 
variable is represented with a node. The node representing X, is connected to the node rep¬ 
resenting Xy in the MRF if Xy is in clX,-. Since the Markov blankets of every variable can 
be determined from the canonical factorization by Corollary 17.21 the MRF can also be deter¬ 
mined from the canonical factorization. 

Notice that every argument (arguments associated with nonzero parity check coefficient) of a 
non-constant SPC factor are in the Markov blankets of the other arguments of the SPC factor. 
Therefore, the nodes representing these variables in the MRF are all pairwise connected. In 
graph theoretic terminology, these nodes form a clique in the MRF. Hence, SPC factors are 
functions of the cliques (not necessarily maximal) of the MRF. 
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7.4 Comparison to the Hammersley-Clifford Theorem 


The relation between the factorization of a multivariate PMF and Markov properties is first 
established by Hammersley and Clifford in |[T8l[T9ll . In this work they show that any strictly 
positive multivariate PMF can be expressed as 

P(x) ^ ^ n 0d(Dx), (7.5) 

DeOc 

where each element of 'Dq is associated with a clique in the MRF. Moreover, their proof is 
constructive. The factor functions are given as 


0d(Dx) 4 Y\ P(D'x + (I - D')xb)(^-')'”"“'') 


(7.6) 


where Xg is a fixed configuration!^ 


:D'D=D' 


Although both in our and their approaches the factor func¬ 
tions appear to be the functions of the cliques of the MRF, our approach differs significantly 
from theirs in many aspects. 


First of all, the dependencies between the random variables imposed by factor functions in 
(17.61) are rather arbitrary. SPC factors, on the other hand, impose an algebraic form of de¬ 
pendency. In other words, SPC factors explain how a random variable is related to a linear 
combination of other variables. This property is quite important and allows us to express an 
inference problem as a decoding problem. 

Second, the factor functions defined in (17.61) depend on a certain fixed configuration xg. A 
different factorization is obtained for each different xg. Therefore, the factorization proposed 
by Hammersley and Clifford is nof unique. On the other hand, the canonical factorization is 
unique as explained in Section |4^ 

In addition, there is at most one factor function per clique in the factorization given in (17.51) 
whereas there may be more than one SPC factors depending on the same set of variables in 
non-binary fields. 


Finally, the applicability of our approach is more restricted than that of the Hammersley and 
Clifford’s. Our method is applicable only if the event space of the combined experiment can 
be mapped to whereas the Hammersley-Clifford fheorem is applicable fo any sfricfly pos¬ 
itive pmf. Moreover, it should be emphasized that both approaches are applicable to strictly 
positive pmfs only. 

’ This configuration corresponds to the all-blactc coloring in I18II19I . 
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CHAPTER 8 


Conclusions and Future Directions 


8.1 Summary 

In this thesis the Hilbert space of pmfs is introduced. Then the tools provided by this Hilbert 
space, is utilized to develop an analysis method for multivariate pmfs. The aim of this anal¬ 
ysis method is to obtain a factorization of the multivariate pmf. The resulting factorization 
from this analysis method possess some important properties. First of all it is the ultimate 
factorization possible. Secondly, it is unique. Thirdly, the conditional independence relations 
can be determined completely from this factorization. Probably the most important property 
of the resulting factorization is the fact that it reveals the algebraic dependencies between the 
involved random variables. Thanks to this fact probabilistic inference problems can be trans¬ 
formed into channel decoding problems and channel decoders can be used for other tasks 
beyond decoding. Many examples are provided in thesis on how channel decoders can be 
used as detectors of communication receivers. It is also shown that the decoders of tail biting 
convolutional codes can be used as a MIMO detector. This approach results in a significant 
reduction in complexity while maintaining good performance. 


8.2 Future directions 

The application of the Hilbert space of pmfs is presented in this thesis is the canonical fac¬ 
torization. We believe that the Hilbert space of pmfs might lead to further applications in 
communication theory, information theory, and probabilistic inference. 

The most important consequence of the canonical factorization is that it shows how to em- 
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ploy channel decoders for other purposes. The MIMO detector which uses the decoder of 
a tail biting convolutional code demonstrates that new detection and probabilistic inference 
algorithms can be developed by using channel decoders for tasks beyond decoding. 

Employing channel decoders for other tasks also allows to apply the analog probability prop¬ 
agation method proposed in lfT4l [151 for other probabilistic inference problems. In particular, 
by implementing channel equalizers and MIMO detectors with analog probability propaga¬ 
tion much more power efficient communication receivers can be implemented. We anticipate 
that this direction will be the most important application area of this thesis. 

Some other possible future directions are summarized below. 

8.2.1 Applications on machine learning 

Estimating the factorization of a joint pmf from samples generated from the pmf is an im¬ 
portant problem in machine learning, e.g. Ii20l . A straightforward approach after this thesis 
could be estimating the joint pmf first and obtain the canonical factorization by applying the 
procedure explained in Chapter [3l However, such an approach both require too many sam¬ 
ples to estimate the joint pmf accurately and extensive computational resources to obtain the 
canonical factorization. A more interesting solution to this problem might be proposed by 
combining the results obtained in this thesis and the results presented in ll32l . By combining 
these results it can be concluded that the necessary algorithm for estimating the factorization 
of a joint pmf from samples is exactly the inverse of the sum-product algorithm. 

As it is explained in Section [431 the ultimate factorization of a pmf is the canonical factoriza¬ 
tion. The equivalent Tanner graph representing the canonical factorization is shown in Eigure 
15.31 b. Hence, estimating the canonical factorization is equivalent to estimating all of the local 
evidences in this Tanner graph. 

Eet X = [Xi,X 2 ,... ,Xn] be distributed with a p{x) in P^n. Estimating all the marginals 
Pr{X, = Xj} from experimental data is much easier than estimating the joint distribution p{x) 
from data. Eet 

X,- = a,-X^, fon = N+l,N + 2,..., |‘K|, 

where a^j+i, axi+i, ..., a|<^| are the elements of PI of weight two or more as we assumed in 
Chapter [5] Since X, for / > X is completely determined by X, the marginal distributions of X, 
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for i > N can also be estimated from the data. Consequently, the marginal distributions of Xi, 
X 2 , ..can be easily estimated from the experimental data. 

However, what we need to estimate the canonical factorization are not the marginal distri¬ 
butions of the random variables Xi, X 2 , ■ ■., but the local evidences in Figure 15.31 b. 
Therefore, we need an algorithm which computes the local evidences from the marginals. 
Notice that, this task is exactly the inverse of the sum-product algorithm as the sum-product 
algorithm computes the marginals from local evidences. 

A question might arise on the existence and uniqueness of the set of the local evidences 
corresponding to a set of marginals. Indeed, if the Tanner graph in Figure 15.31 b represented 
an arbitrary code then we might not find a set of local evidences resulting in a given set of 
marginal distributions at all or might find more than one set of local evidences resulting in 
the same set of marginal distributions. Any linear combination of the vector X is equal to 
aXi for an a € Fg and 1 < / < \}i\. Massey showed in |[32]| that the marginal distributions 
of the linear combinations of a sequence of random variables is enough to specify their joint 
distribution. Hence, the marginal distributions of Xi, X 2 , ..., X\p 4 \ uniquely specifies p{x) and 
consequently its canonical factorization. 

To the best of our knowledge, neither exact nor approximate versions of the inverse of the 
sum-product algorithm is known. As explained above, developing the inverse of the sum- 
product algorithm solves an important problem in machine learning. 


8.2.2 Using channel decoders for channel estimation 

In the examples presented in Chapter 0 we assumed that the channel coefficients are com¬ 
pletely known at the receiver. In a practical communication receiver, the channel coefficients 
must be estimated. Employing channel decoders for channel estimation would be very inter¬ 
esting. 

Actually, the channel estimation problem does not perfectly fit into the framework presented 
in this thesis since the channel coefficients take samples from a continuous alphabet rather 
than a finite alphabet. The apparent solution to this problem might be quantizing the channel 
coefficients. However, such an approach would lead to a factor graph topologically equivalent 
to the one in ll42]| which contains too many short cycles. Hence, such an approach probably 
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will not be useful. 


While employing decoders for detection, we observed that the channel coefficients and the 
channel outputs appeared as the parameters of the canonical factorization of the transmitted 
bits. Therefore, a more interesting approach might be bypassing the channel estimation step 
and estimating the canonical factorization of the joint pmf of the transmitted bits and the quan¬ 
tized channel outputs directly from a pilot sequence. This approach transforms the channel 
estimation problem into a machine learning problem a solution to which is conjectured in the 
previous section. 
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APPENDIX A 


PROOFS AND DERIVATIONS 


A.l Proofs and derivations in Chapter 2 

A.1.1 Proof of Lemma I2I2I 

The function cr(p{x), r{x)) defined in (12.191 ) is an inner product on if it satisfies three inner 
product axioms stated below. 


• Symmetry: This property of cr(.,.) is directly inherited from the inner product on 


• Linearity w.r.t. first argument: If Af {.) is linear this property is also inherited from the 
inner product on M^. 


• Positive definiteness: For any p{x) e 


o-{p{x), p{x)) ^ < Af {p{x)}, Af {p{x)} > 

> 0 


due to the non-negativity of the inner product on M^. The equality is satisfied only if 
Af {p{x)} equals to 0. Since Af {.) is linear and an injection Af {p{x)} is equal to 0 if and 
only if p{x) = 6{x). 
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A.1.2 Rationale behind the proposal for X {-1 

The trivial way of mapping a pmf p{x) e by a vector p e is making the component 
of p equal to p{i). Let this trivial mapping be denoted by T {.), i.e., 

T {p{x)} = ^ p(/)e/. 

(€F, 

Although this mapping is injective, it is obviously nonlinear. Therefore, T {.} does not satisfy 
one of the two requirements imposed by Lemmaand consequently it cannot be employed 
as a tool for borrowing the inner product on M'?. However, we can define a notion of an¬ 
gle between pmfs using T{.} and then reach a proposal for a mapping which satisfies fhe 
requiremenfs of Lemma 12.21 


Whafever fhe definition of fhe angle befween fwo pmfs is, fhe sine of fhe angle should be kepi 
consfanl if fwo pmfs are scaled by some nonzero scalars. In ofher words, for any p{x), r{x) € 
and a,/3 e M \ {0} 

sin Zip{x), r(x)) = sin Z{a Kl p{x),^ Kl r{x)), 


where Z{p{x), r{x)) denotes fhe angle befween p{x) and r{x). This properly of angle imposes 
lhal fhe angle befween fwo pmfs should be a function of fhe fwo paramelric curves on 
based on p{x) and r{x) as follows. 

Cp(0 = T{tM p{x )\, 
c,(0 = T{tMr{x)}. 

For ? = 0 bolh of Ihese curves pass Ihrough ^1. An example consisting of a pair of such 
curves for is depicfed in Figure lA.ll Then we can reasonably define fhe angle befween 
p{x) and r{x) as fhe angle befween Cp{t) and Crit) al fheir infersecfion poinf. 


In order fo derive fhe angle befween Cp(f) and Crit), we need fo derive vectors fangenl fo Ihese 
curves af t = 0. The expression defining Cp(t) can be simplified as 


z 

isF, 

KP 

'LjeF 

Z 

f 

z- 




e,-. 


* We enumerate the components of the vector with the elements of instead of positive integers. 
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Figure A. 1: A pair of parametric curves obtained by scaling two pmfs in and then map¬ 
ping them to via the trivial mapping. 


Let ip denote the vector which is tangent to Cp{t) at t = 0. Then tp can be derived using 
derivation as 




(eF, 


^logp(i) - ^ logpO') 


? V 




Having inspired from this equation, We proposed the mapping X{.) as 


£1 


= -in 


/ 

= logp(0 - - 2] logpO') 


ie¥, 


q \ 




e,-. 


Since £{.} is defined as above, the angle between the two curves Cp(f) and c^(t), which is 
proposed to be the of the angle between p{x) and q{x), is equal to the angle between p{x) and 
q{x) on defined on (12.281) . 
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A.1.3 Proof of Lemma l23l 


First we are going to prove that X{.) is linear and then it is an injection. For any p{x), r(x) e 

f \ 


£ {p{x) ffl r{x)} = ^ logCF, {p{x)r{x)} ^ logCp, {p{x)r{x)} 

i’eF, ^ I q 

( \ 

1 1 1 
^ > log -p{i)r{i) - - > log -p{i)r{i) 

( 1 1 f 1 

= ^ logpCO - - ^ logpO') e, + ^ logp(/) - - ^ logpO') 


y ie¥q \ 


e/ 


= £{p{x)} + £{r{x)}, 


where y in the second line above is X/eF, f( 0K0- Hence, X {■) is additive. For any p{x) € 
and a € R, 


’ 

£{aM p{x)} = Yj log^F, l(p(^))"l ^ logC f, [{p{xT)] 

( I ^ 

^ ^ a log p{i) - - ^ log p{j) 


X=J 


q \ 


= a£{p{x)]. 

Hence, £ {.) is homogeneous and consequently a linear mapping. 

A linear mapping is injective if its kernel (null space) is composed of only the additive identity. 
If £ {p{x)} = 0 for a p{x) € then 


log p{i) - - ^ log p{j) ^0 V/ € ¥g 


^ je-^q 


p{i) ^ exp 


- y logpCi) 

n x—i 


yeF, 


V/ € F., 


which is possible only if p{x) = ^ or equivalently p{x) - 6{x). Since the kernel of X{.) 
consists of only 6{x), which is the additive identity in Pf , the mapping £ {.) is injective. 


A. 1.4 Expressing the inner product on P^^ as a covariance 

Let A be a F^-valued random variable. Then log p{X) and log r{X) are two real-valued func¬ 
tions of an F^-valued random variable. Their expectations and covariance are well-defined. 
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Clearly, the inner product of p{x) and r{x) can be expressed as 

< p{x), r{x) > ^ gE [(log p{X) - E [log p(X)])(log r{X) - E [log r(X)])] 

^ ^ (E [log p{X) log r(X)] - E [log p{X)\ E [log r{X)\), 

where E [.] denotes expectation and X is a uniformly distributed random variable in F^, i.e. 

Pr{X = x) - e{x). 


A.1.5 Proof of Lemma [23] 


For any p(x) in Pf 


<£{pix)},l >K9 - < 


2] log p{i)-- ^log pU) 

( I ^ 

= logpil) - log p{j) 


e„l 


g V 


Mg 


= 2]logp(0 - 2] logp(7) 

ie¥g je¥^ 

= 0 , 


which completes the proof. 


A.1.6 Proof of Lemma [2^ 

First we are going to simplify the expression defining {p} (x). 

[pl (x) ^ Cf^ jexp lip - s(x)||^j| 

>R9 + l|s(x)||^ 


Since ||p|| and ||s(x)|| is constant for all x, the product exp 
due to the normalization operator. Therefore, 

(pl (x) ^ Cf, (exp (< p, s(x) >M?)|. (A.l) 



= CF„iexp 


- 2 < p, s(x) 


= Cfloxp 


exp - 


2 

llsWII 


no 










If p is equal to X {p{x)} for a p{x) in then the inner product above becomes 
<p,s(;c)>K<, = < X{pW) ,ex --1 > 

q 

= < L{p{x)} ,tx>--< £{p{x)} ,l> . 

q 


Due to Lemma 1231 the second inner product above is zero. Inserting this result into (lA.ll) 
yields 


{£[p{x)]\{x) 


Cw, {exp 


logp(x) - - y logpo') 

n 




Cf^ |exp(logp(;c))) 

Pix), 


where the summation in the first line above is cancelled by the normalization operator since 
it is a constant. This completes the proof of the first part of the lemma. 


Any p € can be decomposed as 

p = p' + al, 

for an a in M such that p' J. 1. Inserting this decomposition into (lA.II) yields 

X"^ {pi (x) = Cf, |exp (< p' + al, s{x) >rOI 

^ Cf, |exp (< p', s{x) >R9 +a < 1,s(x) >]r0) 

s(x) is orthogonal to 1 for all x. Therefore, 

£^ (P) (x) = Cf, |exp (< p', s(x) >r01 

( I 11 

X |X^ {pi (x)} = y log -exp (< p', s(i) >r0 - - y log -exp (< p', s{j) >r0 

'V n ^ 


ie¥, 


q V 




’ 

< p', S(0 >R‘/ -1 < p', J] SU) >]E 


/eF„ 


AF, 


e,', 


where y - H/gF^ exp (< p', s(/) >r9). ^(j) is equal to the zero vector. Therefore, 

X |X^ {p) (x)) ^ ^ (< p', s(/) >r 0 e; 

/sF, 


/£F„ 


= 2](<p',e,>R.+1 <pM 


/£F„ 
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If p is orthogonal to 1 then p becomes equal to p' and consequently 


■C[-t {p)W! = p. 


A.2 Proofs and derivations in Chapter 3 

A.2.1 Proof of Lemma IXT] 


Inserting the expressions for pi(x) and P 2 (x) into inner product definition yields 

n(aF) r2(bF) I n(ai^) r2(bF) 


< Pi(x), P 2 (x) > = 2 log 


ieF" ^ ^ ^ jgpw 


^N-\ 

:T 


i V T 1 V T 


T 


ieF" 




,N-\ 




r2(bi^) 


ieF" 


q 


q‘ 


ieF" 


isF" 


q- 


,N-\ 


^ logri(aF)logr2(bf) - Z logn(ar) ^ log r2(brXA.2) 


i£F~ 


ieF~ 


isF" 


First we are going to derive the inner product of the two SPC constraints if there exist an 
cr € such that b = aa. Since is a field with q elements and a is nonzero there are i 
vectors satisfying the equation ai^ = j for all j € F^. Hence, 


< Fi(x),P2(x) > 


q 


N-i 


q 


N-\ 


Yj log riii) log r 2 {aj) 

feF, 

< ri(x), r 2 (ax) > , 


( 


\ 


-q 


N-l 


J]log nO') 

J 


f \ 

Y log r2{aj) 


which completes the proof for the first part. 


In the second part, we derive the inner product of pi(x) and P 2 (x) when there is not any a € F^ 
such that b = aa. In other words, a and b are linearly independent. The first summation in 


(IA.2I) can be regrouped for this case as follows. 


Y logri(ai^)logr2(br) 

isF" 


z z log nO') logr 2 (bi^) 


isF, Vi:aF=; 


^lognO') Z log''2(bi^) 


i6F„ 


2 log n O') X X log r 2 {k) 

jsMq feF, Vi:(aF=;Abi^=/:) 


Y ‘^■1^ Z Z ^ 

fsF^j 7'€F, vi:(ai^=;'AbF=i:) 
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Since is a field with q elements and a, b are linearly independent the innermost summation 
above runs times for all j and k. Therefore, 


^ log ri(ar)logr 2 (bi^) 


= q 


„W-2 


log ^lO') Yu logr2W 


ieF" 






= q 


N-l 


'' 

^lognO') ^ log'' 2 ( 7 ) 

JsF^ vysF^ 


Inserting this result into (IA.2I) yields 

< PiCx),P 2 (x) > = q’^~^ Y log Y X Y 1 °§ ^ 2 (bi^) 


;eF, 


feF, 


ieF^ 


ieF" 


= q^ ^Y Y ''2(7) 


7eF, 


7eF^ 


q‘ 


,N 


2 log n ( 7 ) Y ^ Y ^ 


= q^ ^Y Y ^Y Y 

7eF, 7eF^ 7eF, 7eF^ 

= 0 , 


which completes the proof of the second part. 


A.2.2 Proof of Lemma |3j2] 

First we are going to prove that im {.Sq} is a subspace of f^N by showing that .S^ {.) is a linear 
mapping from to 'P^n. For any p{x), r{x) € and a,^ e M 

{cr K1 p{x) ffl yS K r{x)} = C^n |Cf^ [ipix)f irix)f } |. 

The inner normalization operator above can be cancelled since there is another normalization 
outside. 

6^a {O' ^ Pix) ffl yS ^ r{x)} = CfN {((p(ax^))"(r(ax^))^)}. 

Using the definition of addition and scalar multiplication on P-^n we obtain 

.Sa {a Kl p(x) fflyS Kl r(x)) = a Kl Cp^ {p(ax^)| fflyS Kl Cp^ {r(ax^)| 

= aKI6^a{FW} fflySKI6^a{?"W}> 

which proves that .Sa{.) is a linear mapping. Since the image of any linear mapping is a 
subspace of the co-domain, im {.Sal is a subspace of P-^n. 
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Obviously, 5a {.} is an injective mapping for nonzero a. Therefore, 


dim im {5a) 


which completes the proof. 


dim;pF^ 

q-\, 


A.2.3 Proof of Lemma 1331 

If there exist an a e such that b = aa then for any p{x) € 

■Sh {pWl = CfN {p(bx^)} 

= CfN {pCcrax^)} 

= 5a {p{ax)}. 

Since p{x) is in ;Pf^, p{ax) is also in Therefore, im |5b) c im {5a). Similarly, it can be 
shown that im {5a) c im {5b). Consequently, 

im{5a) = im{5b) 

if b is equal to era for an cr € F^. 

If there is not any a such that b = aa then for any pi(x) € im {5a) and P 2 (x) € im {5b) 

< pi(x),p2(x) 0 

due to Lemma l3T] Hence, 

im{5a) -L im{5b) 

if there is not any a e F^ such that b = aa. 


A.3 Proofs and Derivations in Chapter 4 

A.3.1 Proof of Lemma |4H] 

We need to show that p{x) is orthogonal to C^n |r(ax^)| for any r{x) e 

< p(x), CrN {r(ax^)) > - log p(iD) log “ 4 Z Z 

ieF, ^ ^ i€F, i€F, ^ 

= z Z p^^^^ Z 

i€F, ^ ieF, ieF, 
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Let j be a vector in For each j vector there are rank(D) j yg^tors in satisfying the 
relation 

jD = iD, (A.3) 

where rank (D) denotes the rank of the dependency matrix D. Therefore, the first summation 
in (IA.3I) is equal to the following nested summation. 

1 


^ log p(iD) log r(ai^) 


i6F„ 


q 


,V-rank(D) 


ieF" 


logpGD)logKaj^) 

g£F~:iD=jD 


1 


<? 


.V-rank(D) 


^ logpCiD) 


ieF" 


j£Fj':iD=jD 


The inner summation on the right hand side above can be grouped as 

1 


^logp(iD)logr(ai^) = 


ieF„ 


q 


,V-rank(D) 


^ logp(iD) 


i€F" 


i: 




^ logr(aj^) 

g6F^:iD=jDAaj^=k 


1 


q 


,V-rank(D) 


2] logp(iD) 


isF" 


log 




Z > 

vjeFj':iD=jDAajr=i: 


We have to determine how many times the innermost summation above runs. Let di, d 2 , ..., 
drank(D) be the nonzero rows of D. Then the innermost summation above runs once for all j 
vector satisfying the system of linear equations below. 


d, 


diF 

d2 

f - 

d2i^ 

drank(D) 


drank(D)l 

a 


/ 


Due to the definition of the dependency matrix, all nonzero rows of D are linearly independent. 
Moreover, all these nonzero rows of D are also linearly independent with a, since a is not equal 
to aD. Therefore, the system of linear equations above has ^A'-rankcz))-! solutions. Hence, 

^ log p(iD) log r(ai^) = - ^ log p(iD) ^ log r{k). 

q. 


ieF„ 


i€F" 


ke¥a 


Inserting this result into (IA.3I ) yields 

< p(x), CfN {r(ax^)} > ^ ^ log P(jD) ^ log r{k) - ^ ^ log p(iD) ^ log r(ai^) 

^ isF" ke¥g ^ ieF, ieF, 

= “ Zi 2 ^ 2 


ieF" 


k£F„ 


q: 


I£F" 


k£F„ 


= 0 , 
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which completes the proof. 


A.4 Proofs and Derivations in Chapter 6 


A.4.1 The factorization of a posteriori probability of X given in Section [63] 


Expanding the absolute value in (16.31) yields 


p{x) = Cjpjv exp 


- Cp^iexp 


2cr^ 


2cr^ 


N 


- ^ hipy (Xi) 


1=1 


2 M 


N 


|y|2 - 2y 2] Re {h*p, (x,)*) + Ipp, fe) 


1=1 


N 


1=1 


2 am 


Since |yr does not depend on x, it can be cancelled by the normalization operator which gives, 

N ( N \( N AAM 


p(x) = Cp^ jexp 
= CpAiiexp 


1 


20-2 

N 


f 1=1 


Y_^2Rt[yh*pq (v;)*)- Z hiPq (V;) 'y Pq (v,) 


f 1=1 


Vi=l 


2Re [yh*Pq (V;)*} - \hiPq (V;)| ^ 2Re [hiPq (v,) h*Pq [xj) } 

2-i Orr^ 2-i 2-i 


20-2 


,1=1 “ j=2 1=1 

Since PSK is a constant amplitude modulation, \pq (v,)| is constant for all v,-. Consequently, 
\hiPq (v,)p does not depend on x. Canceling \hiPq (v,)p by the normalization operator yields 
the desired factorization. 

^ ^''>'f^£lyh*pqixi)*]]-^fi 2Re[hih*pqixi)pq(xjy^ 

1 f — 

1=1 V j=2 1=1 t 


p(x) = CpW ]~[ exp 


20-2 


(A.4) 


A.4.2 Proof of Theorem 16.11 


The necessary row operations are listed below. 

1. Add E' row to (1 + it^tllyh ^w for 7 - 3 up to N. 

2. For / - 4 up to N, add + ^yh to 

(a) ((izi|iz 2 ) + 1 ) 1 /* row, 

(b) (iiziKiz 2 ) 2 ) 1 /* row, 

(c) ( d~P 6 ~ 2 ) + jyh row for 7 = 4 up to i - 1 , 


116 




































(d) + 3)'^* row, 


(e) 


+ 1 + /)'* row for 7 = / + 2 up to N. 


A.4.3 Derivation of the factorization in (16.511) 


Pr{X = x|Y ^ y) = C^in, {exp 


lly - H,w|p 

2cr2 


exp (-^ (||y||2 - 2Re {w'^Hfy) + w"HfH,w) 


We can cancel ||y|f since it is constant for all x. Let u = H^y and R = H^Hc- Then the 
factorization becomes, 


Pr{X-x|Y = y} oc exp (-2Re {w^u) + w^Rw) 


exp 


N, 


2 cr2 


k=\ 


N, N, 

2 ^ Re {y (x^) (xj,)* {^)kjv (x/) 

k=i 1=1 




where ut is the component of u and i^)k,i is the entry in the row and column of the 
matrix R. Since R is hermitian symmetric. 


Pr{X = x|Y = y) oc exp 


• exp 


1 


N, 


2(t^ 


\\ 


2] (2Re {y (x,) u*) - ||y (x,)||" iR)k,k) 

.k=i 


, f N, k-1 


2(t^ 


\\ 


2]2]2Re|y(x,r (R)wv(x/)} 


\k=2 1=1 


Since ||y(x,t)|f is constant. 


Pr{X = x|Y = y) oc exp 


20-2 


N, N, k-1 \\ 

2 2Re {y (x,) 2Re |y (x,)* (R), 7 y (x,)| . 

t:=l 


N, k-1 

ZZ 

k=2 1=1 


. (A.5) 


The function vixj^) can be expressed in terms of /?(.) function as 


y (x;.) ^ ap{X 2 k-i) + a*fi{x 2 k), 


where a = ^ + j^. Therefore, 


Re{y(Xi)M^} ^ Re{aM^}^(x2^-!) + Re{a*M^}yS(x2^). 


(A.6) 
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Furthermore, 


Re|y(x^:)* (R)^.,/v(x/)| 


Re |(a*y6 {x2k-\) + afS {x 2 k)){^)k,i{aP (^2/-i) + ap{x2i))] 

Re j|a |2 (R)i,,^(x2;t-i)/3U2/-i)) + Re (R)vy6U2^-i)/?te/)) 

+Re \a^{K)k,iP{x2k)P{x2i-\)] + Re [\af (R)kjPix2k)P{x2i)} 

^(J 3 ix2k-i + 2C2/-i) Re |(R)^,/! +Pix2k-1 + X2i) Im |(R)^,/) 

-p{x2k + JC2/-1) Im |(R)^,/) +p{x2k + X21) Re {(R)*,/)). (A. 7 ) 


Inserting (IA.6I) and (IA. 7 I) together with the definition of the -)/(.;.) function into (IA. 5 I) gives 
the desired factorization. 


Pr{X - x|Y - y) cx 


N, 

y\x2k-\\ 

k=\ 

N, k-l 


Re {uk} + Im {uk) 


k=2 l=l 
N, k-l 

nn 

k=2 1=1 




,o-\y\X2k', 


Re|(Rk/l 


Re {uk) - Im {uk} 


-,cr 


,cr y X2^:-1 +X2/;- 


Im|(R)^,/) 


-,0-1 


y\X2k + X21-1-, 


Im|(R)r.,/| 


,o-\y\X 2 k + X 2 r,' 


Re|(Rk/l 


, cr 


(A.8) 


A.4.4 Permutations used in the simulation in Section 16.6.5 1 


The set P consists of the following permutations. 


P' - 1 

fT fT fT fT vT vT vT vT vT fT vT vT fT fT fT fT 1 

^1 ‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 ‘2 M ^6 ^8 ‘10 ‘12 ^14 ‘l 6 j 

to 

II 

fT fT fT fT fT fT fT fT fT fT fT fT fT i»7 fT 1 

_‘l ‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 M ^6 ^8 ‘iO ‘12 ^14 ‘I6 ‘2 J 

II 

fT fT |»r fT fT |»r |»r 1 

‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 ^6 ^8 ‘10 ‘12 ^14 ‘I6 ‘2 M J 

P4 = 1 

l»7’ |»7' 1*7’ 1*7’ 1*7' 1*7' i?7’ i?7’ i»7’ 1*7’ i?7’ 1 

^3 ^5 ^7 ^9 ^11 ^13 ^15 ^8 ‘lO ^12 ^14 ^16 h M ^6 J 

II 

[1*7 1*7’ |»7 |?7’ |»7 1*7’ |»r |»7 i?7 |?7’1 

[‘i ‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 ‘10 ‘12 ^14 ‘I6 ‘2 M ^6 ^8 J 

II 

[fJ fT’ f7 fr fJ fT’ fT’ f7 f7 f7 fT’ fT’ f7 fT’ f7 fT’ 1 

[‘i ‘3 ‘5 ‘7 ‘9 ‘li ‘13 ‘15 ‘12 ^14 ‘I6 ^2 M ^6 ^8 ‘lOj 

II 

l^T fT fT fT fT fT fT fT fT fT fT fT fT fT fT fT 1 

[‘1 ‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 ^14 ‘16 ‘2 M ^6 ^8 ‘10 ‘12J 

00 

II 

Ffr 1*7’ fT fT fT fT fT fT fT fT fT fT fT fT fT fT 1 

[‘l ‘3 ‘5 ‘7 ‘9 ‘11 ‘13 ‘15 ‘16 ‘2 M ^6 ^8 ‘10 ‘12 ‘14J 
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II 

[*4 *6 *8 *10 *12 *14 *16 

Pio - 

Ur 

[*6 *8 *10 *12 *14 *16 *2 

II 

[*8 *10 *12 *14 *16 *2 *4 

Pl2 - 

tfr fT fT fT fT fT fT 
[*10 *12 *14 *16 *2 *4 *6 

Pl3 - 

Ifr fT fT fT fT fT fT 
[*12 *14 *16 *2 *4 *6 *8 

II 

fT" fT fT fT fT fT 
[*14 *16 *2 *4 *6 *8 *10 

Pl5 - 

Ifr fT fT fT fT fT fT 
1*16 *2 *4 *6 *8 *10 *12 


1*7’ 1*7’ |»r |»r 1*7’ |»7’ 1*7’ 1 

h ^3 ^5 ^7 ^9 ‘li ^13 ^15 J 

fT fT fT fT fT fT fT fT fT 1 

M ^3 ^5 ^7 ^9 ^11 ^13 ^15 J 

fT fT fT fT fT fT fT fT fT 1 

^6 ^1 ^3 ^5 ^7 ^9 ^11 ^13 ^15 J 

fT fT fT fT fT fT fT fT fT 1 

^8 ^1 ^3 ^5 ^7 ^9 ^11 ^13 ^15 J 

1*7’ fT fT fT fT fT fT fT fT 1 

‘lO ^3 ^5 ^7 ^9 ^11 ^13 ^15 J 

fT fT fT fT fT fT fT fT fT 1 

^12 ^3 ^5 ^7 ^9 ^11 ^13 ^15 J 

1*7’ fT fT fT fT fT fT fT fT 1 

^14 ^3 ^5 ^7 ^9 ‘li ^13 ^15 J 


A.5 Proofs and Derivations in Chapter 7 

A.5.1 Proof of Theorem 17. II 

The proof in the forward direction is actually an implication of the cut-set independence 


theorem stated in 121. An alternative proof is 

given below. 


Let 4(x) and t/(x) be defined as 



= Cjjw 1 

[ 1 

(A.9) 

?/(x) = C^N 1 

Y\ riisiiX^) i. 

[a,sn<,\Kt j 

(A. 10) 

Clearly, 

p(x) ^ CfN {4(x)t/(x)). 

Due to the definitions of TCj^ and 'K^, tk{x) and h(x) satisfies 

(A. 11) 

rk{x) = 

ri(x(I - Ei)), 

(A. 12) 

nix) = 

r/(x(I - E/)), 

(A. 13) 


where E/i (E/) is the dependency with just a single 1 on its (Z'^') entry on the main diagonal. 


An equivalent requirement on conditional independence can be obtained by multiplying both 
sides of (17.11) with Pr{X\|,t_/| = x\|,t_/|) PrfXq^i = xq/;) as follows. 

p(x) Pr{X\|i_/l = = Pr{Xq,t| = x\iq) Pr{X\|/| = xq;|) (A.14) 
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The marginal distribution Pr{X\|i;| = x\|i-|) can be derived in terms of and f/(x) as 
Pr{X\|<:| ^ ^ p{x) 

= Cp,-i i ^ rkix)ri{x) 

fVjri-eF, 

= C^n-Xuix) Y, (A. 15) 

[ '^XkeF, j 

where the last line follows from (IA.12I) . Other marginal distributions in (IA.14I) can similarly 
be derived as 

Pr{X\|/| = x\|/|} ^ C]p«-i |r/(x) ^ r<:(x)i (A.16) 

[ Vjc/€F, j 

Pr{X\|i,/| = X\|i:,/|) ^ Cj,«-2 1 Y ''/W Y 

[v;c*eF, V;f,£F, j 

Inserting (lA.llI) . (IA.15I) . (IA.16I) . and (IA.17I) into (IA.14I) verifies that the equality in (IA.14I) 
holds and completes the proof in the forward direction. 

The proof in the backward direction starts with multiplying both sides of (17.11) with Pr{X\|<:| = 
x\|j(:|) which yields 


p{x) = Pr{X\|q = x\|q)Pr{X^ = v^|X\|,t_/| = k, 1\||| 

= C^N {mkix)mi{x)}, (A. 18) 


where m^ix) and m/(x) are Cpw |Pr{X\|^| ^ xq^i)} and Pr{Xi: = Xk\X\ik,i\ = k,l\|||. Clearly, 
these functions satisfy 


mk{x) ^ mk{x{l-Ek)), (A.19) 

mi{x) ^ mi{x{l - El)). (A.20) 

Then due to Theorem 14.41 ptxi can be factored as in (17.21) . 
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